SlideShare uma empresa Scribd logo
1 de 67
Baixar para ler offline
Executing SPARQL Queries
         over the
   Web of Linked Data
Olaf Hartig*
Christian Bizer˚
Johann-Christoph Freytag*
*Humboldt-Universität zu Berlin ˚Freie Universität Berlin
●   Use URIs as names for things
                                                           ●   Use HTTP URIs so that people
                                                               can look up those names.
                                                           ●   When someone looks up a
                                                               URI, provide useful
                                                               information.
                                                           ●   Include links to other URIs so
                                                               that they can discover more
                                                               things.
                                                                        Tim Berners-Lee, July 2006




 My Movie DB
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   Use URIs as names for things
                                                                                       ●   Use HTTP URIs so that people
                                                                                           can look up those names.
                                                                                       ●   When someone looks up a
                                                                                           URI, provide useful
                                                                                           information.
                                                                                       ●   Include links to other URIs so
                                                                                           that they can discover more
                                                                                           things.
                                                                                                    Tim Berners-Lee, July 2006
                                                         http://mymovie.db/movie1342




                         http://mymovie.db/movie0362




  http://mymovie.db/movie5112


 My Movie DB
                                 http://mymovie.db/movie2449

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   Use URIs as names for things
                                                                                        ●   Use HTTP URIs so that people
                                                                                            can look up those names.
     http://m




                                                                                        ●   When someone looks up a
              ymovie




                                                                                            URI, provide useful
                                                                                            information.
                    ?
                     .d
               b/movie




                                                                                        ●   Include links to other URIs so
                                                                                            that they can discover more
                   2449




                                                                                            things.
                                                                                                     Tim Berners-Lee, July 2006
                                                          http://mymovie.db/movie1342




                          http://mymovie.db/movie0362




  http://mymovie.db/movie5112


 My Movie DB
                                  http://mymovie.db/movie2449

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   Use URIs as names for things
                                                                                        ●   Use HTTP URIs so that people
                                                                                            can look up those names.
     http://m




                                                                                        ●   When someone looks up a
              ymovie




                                                                                            URI, provide useful
                                                                                            information.
                    ?
                     .d
               b/movie




                                                                                        ●   Include links to other URIs so
                                                                                            that they can discover more
                   2449




                                                                                            things.
                                                                                                     Tim Berners-Lee, July 2006
                                                          http://mymovie.db/movie1342




                          http://mymovie.db/movie0362




  http://mymovie.db/movie5112


 My Movie DB
                                  http://mymovie.db/movie2449

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   Use URIs as names for things
                                                                                        ●   Use HTTP URIs so that people
                                                                                            can look up those names.
     http://m




                                                                                        ●   When someone looks up a
              ymovie




                                                                                            URI, provide useful
                                                                                            information.
                    ?
                     .d
               b/movie




                                                                                        ●   Include links to other URIs so
                                                                                            that they can discover more
                   2449




                                                                                            things.
                                                                                                     Tim Berners-Lee, July 2006
                                                          http://mymovie.db/movie1342




                          http://mymovie.db/movie0362




  http://mymovie.db/movie5112


 My Movie DB
                                  http://mymovie.db/movie2449

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   Use URIs as names for things
                                                                                        ●   Use HTTP URIs so that people
                                                                                            can look up those names.
     http://m




                                                                                        ●   When someone looks up a
              ymovie




                                                                                            URI, provide useful
                                                                                            information.
                    ?
                     .d
               b/movie




                                                                                        ●   Include links to other URIs so
                                                                                            that they can discover more
                   2449




                                                                                            things.
                                                                                                     Tim Berners-Lee, July 2006
                                                          http://mymovie.db/movie1342




                          http://mymovie.db/movie0362


                                                                                                                http://geo.db/country21




                                                                                                                                                                 http://geo.db/country7
  http://mymovie.db/movie5112


 My Movie DB                                                                                             http://geo.db/cityCJ
                                                                                                                                          http://geo.db/cityXA

                                  http://mymovie.db/movie2449

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   Use URIs as names for things
                                                                                        ●   Use HTTP URIs so that people
                                                                                            can look up those names.
     http://m




                                                                                        ●   When someone looks up a
              ymovie




                                                                                            URI, provide useful
                                                                                            information.
                    ?
                     .d
               b/movie




                                                                                        ●   Include links to other URIs so
                                                                                            that they can discover more
                   2449




                                                                                            things.
                                                                                                     Tim Berners-Lee, July 2006
                                                          http://mymovie.db/movie1342




                          http://mymovie.db/movie0362


                                                                                                                http://geo.db/country21




                                                                                                                                                                 http://geo.db/country7
  http://mymovie.db/movie5112


 My Movie DB                                                                                             http://geo.db/cityCJ
                                                                                                                                          http://geo.db/cityXA

                                  http://mymovie.db/movie2449

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
●   The Web: a huge, globally distributed dataspace
 ●   Querying this dataspace opens new possibilities:
     ●   Aggregating data from different sources
     ●   Integrating fragmentary information
     ●   Achieving a more complete view




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Traditional approach 1:
    data centralization


 ●   Querying a collection of
     copies from all relevant
     datasets




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Traditional approach 1: data centralization
 ●   Querying a collection of
     copies from all relevant
     datasets




 ●   Misses unknown or new sources
 ●   Collection probably out of date
 ●   Will it scale?


Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Traditional approach 2:
    federated query processing


 ●   Querying a mediator which                                           ?
     distributes subqueries to
     relevant sources and
     integrates the results

                                                                     ?
                                                                         ?   ?



Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Traditional approach 2: federated query processing
 ●   Querying a mediator which distributes
     subqueries to relevant sources and
     integrates the results                                              ?
 ●   Requires sources to
     provide a query service
     Requires information
                                                                     ?
 ●

     about the sources
                                                                         ?   ?

 ●   Misses unknown
     or new sources


Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Main drawback:

                                You have to know the relevant
                                  data sources in advance.
                                       You restrict yourself to
                                        the selected sources.
                                            You do not tap the
                                             full potential of
                                                the Web !




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
A novel approach:

  Link Traversal Based Query Execution

                       Allows data sources to be discovered at runtime




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Implementation Approach




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Main Idea
●   Intertwine query evaluation with traversal of RDF links
●   Alternately:
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set



Queried data
Main Idea
●   Intertwine query evaluation with traversal of RDF links
●   Alternately:
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
    Alternately:




                                                                          htt
●




                                                                              p:/
                                                                                  /.
        Evaluate parts of the query on a




                                                                               ../m ?
    ●

        continuously augmented set of data




                                                                                   ov
                                                                                      ie2
                                                                                         44
    ●   Look up URIs in intermediate




                                                                                           9
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
    Alternately:




                                                                          htt
●




                                                                              p:/
                                                                                  /.
        Evaluate parts of the query on a




                                                                               ../m ?
    ●

        continuously augmented set of data




                                                                                   ov
                                                                                      ie2
                                                                                         44
    ●   Look up URIs in intermediate




                                                                                           9
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
    Alternately:




                                                                          htt
●




                                                                              p:/
                                                                                  /.
        Evaluate parts of the query on a




                                                                               ../m ?
    ●

        continuously augmented set of data




                                                                                   ov
                                                                                      ie2
                                                                                         44
    ●   Look up URIs in intermediate




                                                                                           9
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
●   Alternately:
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
●   Alternately:
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set

                                                      filmingLocation
                      http://.../movie2449                                    http://geo.../Italy
Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                       ?loc
●   Alternately:
                                                                                http://geo.../Italy
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set

                                                      filmingLocation
                      http://.../movie2449                                    http://geo.../Italy
Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                                ?loc
●   Alternately:
                                                                                     http://geo.../Italy
    ●   Evaluate parts of the query on a




                                                                                       ? aly
        continuously augmented set of data




                                                                                        ./I t
                                                                                        ..
                                                                                   g eo
        Look up URIs in intermediate




                                                                               ://
    ●




                                                                                 p
                                                                             htt
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                                ?loc
●   Alternately:
                                                                                     http://geo.../Italy
    ●   Evaluate parts of the query on a




                                                                                       ? aly
        continuously augmented set of data




                                                                                        ./I t
                                                                                        ..
                                                                                   g eo
        Look up URIs in intermediate




                                                                               ://
    ●




                                                                                 p
                                                                             htt
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                                ?loc
●   Alternately:
                                                                                     http://geo.../Italy
    ●   Evaluate parts of the query on a




                                                                                       ? aly
        continuously augmented set of data




                                                                                        ./I t
                                                                                        ..
                                                                                   g eo
        Look up URIs in intermediate




                                                                               ://
    ●




                                                                                 p
                                                                             htt
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                       ?loc
●   Alternately:
                                                                                http://geo.../Italy
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                       ?loc
●   Alternately:
                                                                                http://geo.../Italy
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set



Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                         ?loc
●   Alternately:
                                                                                 http://geo.../Italy
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate
        solutions and add retrieved data
        to the queried data set

                                                                 tics      http://stat.db/.../it
                                                           statis
                                 http://geo.../Italy
Queried data


http://.../movie2449                                             s   ?stat unem          Query
                  filmin                                tis t ic               p_ r a
                         g   Loca                   sta                               te
                                  t   io n   ?loc                                         ?ur
Main Idea
●   Intertwine query evaluation with traversal of RDF links
                                                                                                 ?loc
●   Alternately:
                                                                                          http://geo.../Italy
    ●   Evaluate parts of the query on a
        continuously augmented set of data
    ●   Look up URIs in intermediate                                       ?loc                  ?stat
        solutions and add retrieved data                             http://geo.../Italy http://stats.db/../it
        to the queried data set

                                                                 tics             http://stat.db/.../it
                                                           statis
                                 http://geo.../Italy
Queried data


http://.../movie2449                                             s      ?stat unem          Query
                  filmin                                tis t ic                  p_ r a
                         g   Loca                   sta                                  te
                                  t   io n   ?loc                                            ?ur
In a Nutshell

 ●   Link traversal based query execution:
     ●   Evaluation on a continuously augmented dataset
     ●   Discovery of potentially relevant data during execution
     ●   Discovery driven by intermediate solutions


 ●   Main advantage:
     ●   No need to know all data sources in advance




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Real-World Examples
 SELECT DISTINCT ?author ?phone WHERE {
     ?pub swc:isPartOf
           <http://data.semanticweb.org/conference/eswc/2009/proceedings> .
     ?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel .
     FILTER regex( str(?topicLabel), "ontology engineering", "i" ) .


     ?pub swrc:author ?author .
     { ?author owl:sameAs ?authorAlt }
                                              Return phone numbers of
                                        authors of ontology engineering papers
     UNION
                                                     at ESWC'09.
     { ?authorAlt owl:sameAs ?author }


     ?authorAlt foaf:phone ?phone                                       # of query results         2
 }                                                                    # of retrieved graphs      297
                                                                     # of accessed servers        16
                                                                       avg. execution time    1min 30sec
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Implementation Approach
         ➢   Introduction to the Iterator Paradigm
         ➢   Application to Link Traversal based Query Execution
         ➢   URI Prefetching
         ➢   Extension to the Iterator Paradigm
         ➢   Evaluation

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
 ●   Iterator:
     ●   implements an operation
     ●   is a group of functions:
                OPEN, GETNEXT, CLOSE




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
 ●   Iterator:
     ●   implements an operation
                                                                     I1
     ●   is a group of functions:
                OPEN, GETNEXT, CLOSE
                                                                     I2
 ●   Query execution uses
     a chain of iterators
                                                                     I3




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
 ●   Iterator:
     ●   implements an operation
                                                                                      http://.../movie2449
                                                                                                                                 I1
     ●   is a group of functions:                                                                     filmin
                                                                                                            gLoc
                                                                                                                ation    ?loc




                OPEN, GETNEXT, CLOSE

                                                                                                         stati
                                                                                                               stics
                                                                                                                         ?stat   I2
                                                                                              ?loc

 ●   Query execution uses
     a chain of iterators
                                                                                              ?stat
                                                                                                                                 I3
     Each iterator responsible
                                                                                                        unem
                                                                                                             p
 ●                                                                                                               _rate

                                                                                                                          ?ur



     for a single triple pattern


 http://.../movie2449                                                        s   ?stat unem          Query
                   filmin                                           tis t ic               p_ r a
                          g         Loca                        sta                               te
                                         t   io n     ?loc                                            ?ur
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                                                                                                                         Ii for tpi

    1. Substitute tpcur = μcur [ tpi ]


    2. Find matching triples match(tpcur ) in queried data set


    3. Create solution μ' for each t in match(tpcur )


    4. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                      tpi = ( ?loc ex:stats ?s )
                                                                                                                         Ii for tpi
                      μcur = { ?p → http://ex... , ?loc → http://geo... }                                                     Example
    1. Substitute tpcur = μcur [ tpi ]


    2. Find matching triples match(tpcur ) in queried data set


    3. Create solution μ' for each t in match(tpcur )


    4. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                      tpi = ( ?loc ex:stats ?s )
                                                                                                                         Ii for tpi
                      μcur = { ?p → http://ex... , ?loc → http://geo... }                                                     Example
    1. Substitute tpcur = μcur [ tpi ]
                      tpcur = ( http://geo... ex:stats ?s )
    2. Find matching triples match(tpcur ) in queried data set


    3. Create solution μ' for each t in match(tpcur )


    4. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                      tpi = ( ?loc ex:stats ?s )
                                                                                                                         Ii for tpi
                      μcur = { ?p → http://ex... , ?loc → http://geo... }                                                     Example
    1. Substitute tpcur = μcur [ tpi ]
                      tpcur = ( http://geo... ex:stats ?s )
    2. Find matching triples match(tpcur ) in queried data set
                      (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
    3. Create solution μ' for each t in match(tpcur )


    4. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                      tpi = ( ?loc ex:stats ?s )
                                                                                                                         Ii for tpi
                      μcur = { ?p → http://ex... , ?loc → http://geo... }                                                     Example
    1. Substitute tpcur = μcur [ tpi ]
                      tpcur = ( http://geo... ex:stats ?s )
    2. Find matching triples match(tpcur ) in queried data set
                      (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
    3. Create solution μ' for each t in match(tpcur )
                      μ' = { ?s → http://db... }

    4. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                      tpi = ( ?loc ex:stats ?s )
                                                                                                                         Ii for tpi
                      μcur = { ?p → http://ex... , ?loc → http://geo... }                                                     Example
    1. Substitute tpcur = μcur [ tpi ]
                      tpcur = ( http://geo... ex:stats ?s )
    2. Find matching triples match(tpcur ) in queried data set
                      (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...)
    3. Create solution μ' for each t in match(tpcur )
                      μ' = { ?s → http://db... }

    4. Return each μcur U μ' as a result
                     { ?p → http://ex... , ?loc → http://geo.db/... , ?s → http://db... }
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Iterator based Query Execution


 ●   Results of Ii are solutions for tp1 , … , tpi

                                                                     Ii-1 for tpi-1




                                                                        Ii for tpi




                                                                     Ii+1 for tpi+1




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Implementation Approach
         ➢   Introduction to the Iterator Paradigm
         ➢   Application to Link Traversal based Query Execution
         ➢   URI Prefetching
         ➢   Extension to the Iterator Paradigm
         ➢   Evaluation

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Application to Link Traversal


 ●   The queried data set grows



                                                                     Ii-1 for tpi-1




                                                                        Ii for tpi




                                                                     Ii+1 for tpi+1




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Application to Link Traversal


 ●   The queried data set grows


 ●   Look-up Requirement:
                                                                     Ii-1 for tpi-1
       Do not evaluate tpcur until the
       queried data set contains all
       data that can be retrieved from                                  Ii for tpi
       all URIs in tpcur

                                                                     Ii+1 for tpi+1




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Application to Link Traversal
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                                                                                                                         Ii for tpi
       1. Substitute tpcur = μcur [ tpi ]
       2. Ensure look-up requirement for tpcur
       3. Find matching triples match(tpcur ) in queried data set
       4. Create solution μ' for each t in match(tpcur )
       5. Return each μcur U μ' as a result



Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Application to Link Traversal
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                                                                                                                         Ii for tpi
       1. Substitute tpcur = μcur [ tpi ]
       2. Ensure look-up requirement for tpcur
       3. Find matching triples match(tpcur ) in queried data set
       4. Create solution μ' for each t in match(tpcur )
       5. Return each μcur U μ' as a result
                                        Initiate look-ups
                                             and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Application to Link Traversal
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                                                                                                                         Ii for tpi
       1. Substitute tpcur = μcur [ tpi ]
       2. Ensure look-up requirement for tpcur
       3. Find matching triples match(tpcur ) in queried data set
       4. Create solution μ' for each t in match(tpcur )
       5. Return each μcur U μ' as a result
                                        Initiate look-ups
                                             and wait
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Blocked Query Execution
 ●   Waiting for URI look-ups
     blocks query execution
                                                                       Ii-1 for tpi-1




                                                                          Ii for tpi




                                                                      Ii+1 for tpi+1



                                                                     Initiate look-ups
                                                                          and wait



Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Implementation Approach
         ➢   Introduction to the Iterator Paradigm
         ➢   Application to Link Traversal based Query Execution
         ➢   URI Prefetching
         ➢   Extension to the Iterator Paradigm
         ➢   Evaluation

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
 ●   Waiting for URI look-ups
     blocks query execution
 ●   URI prefetching: when a URI                                                     Ii-1 for tpi-1
     is bound to a variable initiate
     look-up in the background
                                                                Initiate look-up
                                                                                        Ii for tpi




                                                                                    Ii+1 for tpi+1


                                                               Ensure look-up
                                                                 is finished       Initiate look-ups
                                                                                        and wait



Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                                                                                                                         Ii for tpi
       1. Substitute tpcur = μcur [ tpi ]
       2. Ensure look-up requirement for tpcur
       3. Find matching triples match(tpcur ) in queried data set
       4. Create solution μ' for each t in match(tpcur )
       5. Initiate parallel look-up for each new URI in μ'
       6. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching


                                                                                     Ii-1 for tpi-1



                                                                Initiate look-up
                                                                                        Ii for tpi




                                                                                    Ii+1 for tpi+1


                                                               Ensure look-up
                                                                 is finished       Initiate look-ups
                                                                                        and wait



Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching


                                                                                     Ii-1 for tpi-1



                                                                Initiate look-up
                                                                                        Ii for tpi




                                                                                    Ii+1 for tpi+1


                                                             Wait until look-up
                                                                is finished        Initiate look-ups
                                                                                        and wait



Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
 ●   Even with URI prefetching
     query execution may block
                                                                                  Ii-1 for tpi-1




                                                                                     Ii for tpi




                                                                                  Ii+1 for tpi+1


                                                             Wait until look-up
                                                                is finished




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
URI Prefetching
 ●   Even with URI prefetching
     query execution may block
                                                                                  Ii-1 for tpi-1




                                                                                     Ii for tpi
 ●   Possible solutions:
     ●   Program parallelism
     ●   Asynchronous pipeline                                                    Ii+1 for tpi+1

 ●   Drawback: requires major                                Wait until look-up
           rewrite of existing                                  is finished
           query engines

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Implementation Approach
         ➢   Introduction to the Iterator Paradigm
         ➢   Application to Link Traversal based Query Execution
         ➢   URI Prefetching
         ➢   Extension to the Iterator Paradigm
         ➢   Evaluation

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Postponing Iterator


 ●   Enabled by an extension of the iterator paradigm:
     ●   New function POSTPONE: take most recently provided
                                result back
     ●   Adjusted GETNEXT: either return the next result or return
                           a formerly postponed result



 ●   POSTPONE allows to temporarily reject input solution μcur




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Postponing Iterator
                                           ?c                               ?cStats

                                 http://geo.db/country/US http://stats.example.org/USstatistics     Results from Ii -1
                                 http://geo.db/country/IT   http://stats.example.org/ITstatistics



                                                                                                    μcur
                                 http://geo.db/country/IT   http://stats.db/example/It

                                 http://example.db/ctry/DE http://stats.example.org/Germany




                                                                                                                         Ii for tpi
 1. Substitute tpcur = μcur [ tpi ]
 2. POSTPONE μcur if look-up requirement doesn't hold for tpcur
 3. Find matching triples match(tpcur ) in queried data set
 4. Create solution μ' for each t in match(tpcur )
 5. Initiate parallel look-up for each new URI in μ'
 6. Return each μcur U μ' as a result

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Outline

     Part I
        Overview of Link Traversal based Query Execution


     Part II
        An Iterator based Implementation Approach
         ➢   Introduction to the Iterator Paradigm
         ➢   Application to Link Traversal based Query Execution
         ➢   URI Prefetching
         ➢   Extension to the Iterator Paradigm
         ➢   Evaluation

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Evaluation
 ●   Implementation: Semantic Web Client Library (SWClLib)
          http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/
 ●   Berlin SPARQL Benchmark (BSBM)
     ●   Simulates e-commerce scenario
     ●   Mix of 12 SPARQL queries
     ●   Generates datasets of different sizes (scaling factor)
 ●   Simulation of the Web of Linked Data
     ●   Linked Data server publishes BSBM datasets
 ●   Experiment
     ●   Adjusted BSBM queries link to the simulation server
     ●   Execute query mix with SWClLib

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Evaluation
                                               250

                                                                                                        w/o prefetching
                                                                                                        w/ prefetching
avg. execution time per query mix in seconds




                                                                                                        non-blocking +
                                               200                                                      prefetching
                                                                                                        all data retrieved
                                                                                                        in advance

                                               150




                                               100




                                                50
                                                                                          scal.factor   # of triples   # of entities
                                                                                              10           4,971            613
                                                                                              20           8,485            928
                                                                                              30          11,999          1,245
                                                 0                                            40          16,918          1,845
                                                     10   20      30         40      50       60

                                                               BSBM scaling factor            50          22,616          2,599

Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data                            60          26,108          2,914
Take-away Summary
 ●   Novel query execution approach for the Web of Data:
     ●   Utilizes the characteristics of the Web
     ●   Traverses RDF links during query execution
     ●   Discovery of new data sources
     ●   No need to know all data sources in advance
 ●   Implementation approach:
     ●   Iterator based execution with URI Prefetching
     ●   Extension of the iterator paradigm (POSTPONE)
 ●   New research challenges:
     ●   Improving result completeness
     ●   Investigating suitable caching strategies
Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
Try it!


 ●   SQUIN                                                           http://squin.org
     ●   Provides SWClLib functionality as a Web service
     ●   Accessible like a SPARQL endpoint
 ●   Public SQUIN service at
                      http://squin.informatik.hu-berlin.de/SQUIN/




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
These slides have been created by
                                      Olaf Hartig

                                             http://olafhartig.de


                     This work is licensed under a
       Creative Commons Attribution-Share Alike 3.0 License
           (http://creativecommons.org/licenses/by-sa/3.0/)




Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data

Mais conteúdo relacionado

Mais de Olaf Hartig

Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Olaf Hartig
 
The Impact of Data Caching of on Query Execution for Linked Data
The Impact of Data Caching of on Query Execution for Linked DataThe Impact of Data Caching of on Query Execution for Linked Data
The Impact of Data Caching of on Query Execution for Linked Data
Olaf Hartig
 
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Olaf Hartig
 

Mais de Olaf Hartig (20)

LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
 
A Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebA Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the Web
 
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result VisualizationRethinking Online SPARQL Querying to Support Incremental Result Visualization
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
 
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...
 
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...
 
Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...
Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...
Tutorial "Linked Data Query Processing" Part 3 "Source Selection Strategies" ...
 
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
 
Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)
Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)
Tutorial "Linked Data Query Processing" Part 1 "Introduction" (WWW 2013 Ed.)
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
 
An Overview on PROV-AQ: Provenance Access and Query
An Overview on PROV-AQ: Provenance Access and QueryAn Overview on PROV-AQ: Provenance Access and Query
An Overview on PROV-AQ: Provenance Access and Query
 
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
 
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
 
The Impact of Data Caching of on Query Execution for Linked Data
The Impact of Data Caching of on Query Execution for Linked DataThe Impact of Data Caching of on Query Execution for Linked Data
The Impact of Data Caching of on Query Execution for Linked Data
 
How Caching Improves Efficiency and Result Completeness for Querying Linked Data
How Caching Improves Efficiency and Result Completeness for Querying Linked DataHow Caching Improves Efficiency and Result Completeness for Querying Linked Data
How Caching Improves Efficiency and Result Completeness for Querying Linked Data
 
A Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked DataA Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked Data
 
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
Towards a Data-Centric Notion of Trust in the Semantic Web (A Position Statem...
 
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
Brief Introduction to the Provenance Vocabulary (for W3C prov-xg)
 
Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Executing SPARQL Queries of the Web of Linked Data

  • 1. Executing SPARQL Queries over the Web of Linked Data Olaf Hartig* Christian Bizer˚ Johann-Christoph Freytag* *Humboldt-Universität zu Berlin ˚Freie Universität Berlin
  • 2. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. ● When someone looks up a URI, provide useful information. ● Include links to other URIs so that they can discover more things. Tim Berners-Lee, July 2006 My Movie DB Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 3. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. ● When someone looks up a URI, provide useful information. ● Include links to other URIs so that they can discover more things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 4. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 5. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 6. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://mymovie.db/movie5112 My Movie DB http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 7. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://geo.db/country21 http://geo.db/country7 http://mymovie.db/movie5112 My Movie DB http://geo.db/cityCJ http://geo.db/cityXA http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 8. Use URIs as names for things ● Use HTTP URIs so that people can look up those names. http://m ● When someone looks up a ymovie URI, provide useful information. ? .d b/movie ● Include links to other URIs so that they can discover more 2449 things. Tim Berners-Lee, July 2006 http://mymovie.db/movie1342 http://mymovie.db/movie0362 http://geo.db/country21 http://geo.db/country7 http://mymovie.db/movie5112 My Movie DB http://geo.db/cityCJ http://geo.db/cityXA http://mymovie.db/movie2449 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 9. The Web: a huge, globally distributed dataspace ● Querying this dataspace opens new possibilities: ● Aggregating data from different sources ● Integrating fragmentary information ● Achieving a more complete view Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 10. Traditional approach 1: data centralization ● Querying a collection of copies from all relevant datasets Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 11. Traditional approach 1: data centralization ● Querying a collection of copies from all relevant datasets ● Misses unknown or new sources ● Collection probably out of date ● Will it scale? Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 12. Traditional approach 2: federated query processing ● Querying a mediator which ? distributes subqueries to relevant sources and integrates the results ? ? ? Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 13. Traditional approach 2: federated query processing ● Querying a mediator which distributes subqueries to relevant sources and integrates the results ? ● Requires sources to provide a query service Requires information ? ● about the sources ? ? ● Misses unknown or new sources Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 14. Main drawback: You have to know the relevant data sources in advance. You restrict yourself to the selected sources. You do not tap the full potential of the Web ! Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 15. A novel approach: Link Traversal Based Query Execution Allows data sources to be discovered at runtime Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 16. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 17. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data
  • 18. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 19. Main Idea ● Intertwine query evaluation with traversal of RDF links Alternately: htt ● p:/ /. Evaluate parts of the query on a ../m ? ● continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 20. Main Idea ● Intertwine query evaluation with traversal of RDF links Alternately: htt ● p:/ /. Evaluate parts of the query on a ../m ? ● continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 21. Main Idea ● Intertwine query evaluation with traversal of RDF links Alternately: htt ● p:/ /. Evaluate parts of the query on a ../m ? ● continuously augmented set of data ov ie2 44 ● Look up URIs in intermediate 9 solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 22. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 23. Main Idea ● Intertwine query evaluation with traversal of RDF links ● Alternately: ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set filmingLocation http://.../movie2449 http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 24. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set filmingLocation http://.../movie2449 http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 25. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a ? aly continuously augmented set of data ./I t .. g eo Look up URIs in intermediate :// ● p htt solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 26. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a ? aly continuously augmented set of data ./I t .. g eo Look up URIs in intermediate :// ● p htt solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 27. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a ? aly continuously augmented set of data ./I t .. g eo Look up URIs in intermediate :// ● p htt solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 28. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 29. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 30. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate solutions and add retrieved data to the queried data set tics http://stat.db/.../it statis http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 31. Main Idea ● Intertwine query evaluation with traversal of RDF links ?loc ● Alternately: http://geo.../Italy ● Evaluate parts of the query on a continuously augmented set of data ● Look up URIs in intermediate ?loc ?stat solutions and add retrieved data http://geo.../Italy http://stats.db/../it to the queried data set tics http://stat.db/.../it statis http://geo.../Italy Queried data http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur
  • 32. In a Nutshell ● Link traversal based query execution: ● Evaluation on a continuously augmented dataset ● Discovery of potentially relevant data during execution ● Discovery driven by intermediate solutions ● Main advantage: ● No need to know all data sources in advance Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 33. Real-World Examples SELECT DISTINCT ?author ?phone WHERE { ?pub swc:isPartOf <http://data.semanticweb.org/conference/eswc/2009/proceedings> . ?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel . FILTER regex( str(?topicLabel), "ontology engineering", "i" ) . ?pub swrc:author ?author . { ?author owl:sameAs ?authorAlt } Return phone numbers of authors of ontology engineering papers UNION at ESWC'09. { ?authorAlt owl:sameAs ?author } ?authorAlt foaf:phone ?phone # of query results 2 } # of retrieved graphs 297 # of accessed servers 16 avg. execution time 1min 30sec Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 34. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 35. Iterator based Query Execution ● Iterator: ● implements an operation ● is a group of functions: OPEN, GETNEXT, CLOSE Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 36. Iterator based Query Execution ● Iterator: ● implements an operation I1 ● is a group of functions: OPEN, GETNEXT, CLOSE I2 ● Query execution uses a chain of iterators I3 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 37. Iterator based Query Execution ● Iterator: ● implements an operation http://.../movie2449 I1 ● is a group of functions: filmin gLoc ation ?loc OPEN, GETNEXT, CLOSE stati stics ?stat I2 ?loc ● Query execution uses a chain of iterators ?stat I3 Each iterator responsible unem p ● _rate ?ur for a single triple pattern http://.../movie2449 s ?stat unem Query filmin tis t ic p_ r a g Loca sta te t io n ?loc ?ur Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 38. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Find matching triples match(tpcur ) in queried data set 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 39. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] 2. Find matching triples match(tpcur ) in queried data set 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 40. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 41. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...) 3. Create solution μ' for each t in match(tpcur ) 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 42. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...) 3. Create solution μ' for each t in match(tpcur ) μ' = { ?s → http://db... } 4. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 43. Iterator based Query Execution ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany tpi = ( ?loc ex:stats ?s ) Ii for tpi μcur = { ?p → http://ex... , ?loc → http://geo... } Example 1. Substitute tpcur = μcur [ tpi ] tpcur = ( http://geo... ex:stats ?s ) 2. Find matching triples match(tpcur ) in queried data set (http://geo... ex:stats http://db...), (http://geo... ex:stats http://ex...) 3. Create solution μ' for each t in match(tpcur ) μ' = { ?s → http://db... } 4. Return each μcur U μ' as a result { ?p → http://ex... , ?loc → http://geo.db/... , ?s → http://db... } Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 44. Iterator based Query Execution ● Results of Ii are solutions for tp1 , … , tpi Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 45. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 46. Application to Link Traversal ● The queried data set grows Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 47. Application to Link Traversal ● The queried data set grows ● Look-up Requirement: Ii-1 for tpi-1 Do not evaluate tpcur until the queried data set contains all data that can be retrieved from Ii for tpi all URIs in tpcur Ii+1 for tpi+1 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 48. Application to Link Traversal ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 49. Application to Link Traversal ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Return each μcur U μ' as a result Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 50. Application to Link Traversal ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Return each μcur U μ' as a result Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 51. Blocked Query Execution ● Waiting for URI look-ups blocks query execution Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 52. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 53. URI Prefetching ● Waiting for URI look-ups blocks query execution ● URI prefetching: when a URI Ii-1 for tpi-1 is bound to a variable initiate look-up in the background Initiate look-up Ii for tpi Ii+1 for tpi+1 Ensure look-up is finished Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 54. URI Prefetching ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. Ensure look-up requirement for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Initiate parallel look-up for each new URI in μ' 6. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 55. URI Prefetching Ii-1 for tpi-1 Initiate look-up Ii for tpi Ii+1 for tpi+1 Ensure look-up is finished Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 56. URI Prefetching Ii-1 for tpi-1 Initiate look-up Ii for tpi Ii+1 for tpi+1 Wait until look-up is finished Initiate look-ups and wait Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 57. URI Prefetching ● Even with URI prefetching query execution may block Ii-1 for tpi-1 Ii for tpi Ii+1 for tpi+1 Wait until look-up is finished Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 58. URI Prefetching ● Even with URI prefetching query execution may block Ii-1 for tpi-1 Ii for tpi ● Possible solutions: ● Program parallelism ● Asynchronous pipeline Ii+1 for tpi+1 ● Drawback: requires major Wait until look-up rewrite of existing is finished query engines Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 59. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 60. Postponing Iterator ● Enabled by an extension of the iterator paradigm: ● New function POSTPONE: take most recently provided result back ● Adjusted GETNEXT: either return the next result or return a formerly postponed result ● POSTPONE allows to temporarily reject input solution μcur Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 61. Postponing Iterator ?c ?cStats http://geo.db/country/US http://stats.example.org/USstatistics Results from Ii -1 http://geo.db/country/IT http://stats.example.org/ITstatistics μcur http://geo.db/country/IT http://stats.db/example/It http://example.db/ctry/DE http://stats.example.org/Germany Ii for tpi 1. Substitute tpcur = μcur [ tpi ] 2. POSTPONE μcur if look-up requirement doesn't hold for tpcur 3. Find matching triples match(tpcur ) in queried data set 4. Create solution μ' for each t in match(tpcur ) 5. Initiate parallel look-up for each new URI in μ' 6. Return each μcur U μ' as a result Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 62. Outline Part I Overview of Link Traversal based Query Execution Part II An Iterator based Implementation Approach ➢ Introduction to the Iterator Paradigm ➢ Application to Link Traversal based Query Execution ➢ URI Prefetching ➢ Extension to the Iterator Paradigm ➢ Evaluation Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 63. Evaluation ● Implementation: Semantic Web Client Library (SWClLib) http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/ ● Berlin SPARQL Benchmark (BSBM) ● Simulates e-commerce scenario ● Mix of 12 SPARQL queries ● Generates datasets of different sizes (scaling factor) ● Simulation of the Web of Linked Data ● Linked Data server publishes BSBM datasets ● Experiment ● Adjusted BSBM queries link to the simulation server ● Execute query mix with SWClLib Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 64. Evaluation 250 w/o prefetching w/ prefetching avg. execution time per query mix in seconds non-blocking + 200 prefetching all data retrieved in advance 150 100 50 scal.factor # of triples # of entities 10 4,971 613 20 8,485 928 30 11,999 1,245 0 40 16,918 1,845 10 20 30 40 50 60 BSBM scaling factor 50 22,616 2,599 Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data 60 26,108 2,914
  • 65. Take-away Summary ● Novel query execution approach for the Web of Data: ● Utilizes the characteristics of the Web ● Traverses RDF links during query execution ● Discovery of new data sources ● No need to know all data sources in advance ● Implementation approach: ● Iterator based execution with URI Prefetching ● Extension of the iterator paradigm (POSTPONE) ● New research challenges: ● Improving result completeness ● Investigating suitable caching strategies Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 66. Try it! ● SQUIN http://squin.org ● Provides SWClLib functionality as a Web service ● Accessible like a SPARQL endpoint ● Public SQUIN service at http://squin.informatik.hu-berlin.de/SQUIN/ Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data
  • 67. These slides have been created by Olaf Hartig http://olafhartig.de This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/) Olaf Hartig - Executing SPARQL Queries over the Web of Linked Data