SlideShare uma empresa Scribd logo
1 de 25
Linked Data Query Processing Strategies
Günter Ladwig, Thanh Tran
International Semantic Web Conference 2010, Shanghai


Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)




KIT – University of the State of Baden-Württemberg and
National Large-scale Research Center of the Helmholtz Association      www.kit.edu
Contents

       Introduction
              Challenges
              Contributions
       Linked Data Query Processing Strategies
       Stream-based Query Processing
       Corrective Source Ranking
       Evaluation
       Conclusion




2    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
What is Linked Data?

       Linked Data Principles
              Use URIs to identify things
              Use HTTP URIs that allow dereferencing
              Dereferencing a URI provides information about the thing in a
              standard format (RDF)
              Include links to other, related URIs
       Linked Data Query Processing
              Evaluate queries directly over Linked Data
              Dereference Linked Data URIs during query processing




3    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Challenges

       Volume of Source Collection
              Each URI is a potential data source
       Dynamic of Source Collection
              Sources may change rapidly over time
              Sources might only be discovered at run-time
       Heterogeneity of Sources, Source Descriptions and
       Access Methods
              Sources vary in size
              Description of sources vary in completeness
              Access methods: URI lookup, SPARQL endpoints, local cache, ...




4    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Contributions

       Discussion of Linked Data Query Processing strategies
       Mixed strategy, combining local indexes and run-time
       discovery
       Stream-based Query Processing
              Data can arrive at any time and in any order
              Suited to deal with network latency
       Corrective Source Ranking
              Deals with different types of source descriptions
              Ranking is refined at run-time




5    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
LINKED DATA QUERY
      PROCESSING STRATEGIES

6   November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Top-down Query Evaluation
                                 SELECT ?paper ?author WHERE {
                                    ?paperswrc:author ?author . ?paperswc:isPartOf ?proc .
                                    ?proc swc:relatedToEvent<http://sw.org/eswc/2010>.
            Probe                }


                                               Source URI                         Score
     Local            Select and                                                                   Retrieve sources
                                               http://sw.org/person/AB              0.87
    source           rank sources                                                                      Join data
     index                                                  ...                   ...


       Local index, assumed to be complete
              Selection and ranking of sources
              No run-time discovery
       Fast, only relevant sources are retrieved
       Not up-to-date, index size may become very large

7    November 11th, 2010   ISWC 2010, Shanghai, China              Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Bottom-up Query Evaluation
SELECT ?paper ?author WHERE {
   ?paperswrc:author ?author .    ?paperswc:isPartOf ?proc .
   ?proc swc:relatedToEvent<http://semweb.org/eswc/2010> . }
                                                     Retrieve source
                                              <http://sw.org/proc/eswc/2010>swc:relatedToEvent
                                              <http://sw.org/eswc/2010> .
                                              ...

       Sources are discovered at                                                      Discover new sources
       run-time through links
                                                                     swc:paper1 swc:isPartOf
                                                                     <http://sw.org/proc/eswc/2010>.
                                                                     ...
       Answers can be incomplete
       as links might not be discoverable
       Slower, as unnecessary sources are retrieved
       Always up-to-date
8    November 11th, 2010   ISWC 2010, Shanghai, China             Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Mixed Strategy

       Combination of top-down and bottom-up strategies
              Partial local index of sources, not assumed to be complete
              New sources are discovered at run-time
       Addresses volume and dynamic of Linked Data
       Corrective Source Ranking
              Deal with heterogeneous source descriptions
       Stream-based Query Processing
              Deal with unpredictable nature of Linked Data access




9    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
STREAM-BASED QUERY
       PROCESSING

10   November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Stream-based Query Processing                                                                 Results

        Network latency                                  Query Plan                            Join
               Do not block!
               Evaluation driven by                                             Join                         name(?y, ?n)
               incoming data
        Compile-time
                                                         worksAt(?x, dbpedia:KIT)            knows(?x, ?y)
               Construct query plan                                                                                                           Samples

               Probe local index for
               sources                                                                Push
        Run-time                                                      Source Retrieval                       Retrieve             Source Ranker
               Rank sources                                                                                  source
                                                                       Source Retriever 1                                       Source 1 (score: 1.0)
               Retrieve sources                                        Source Retriever 2                    Source             Source 2 (score: 0.7)
                                                                                                           discovered                   ...
               Push data into query plan                                        ...
               Discover new sources


                                                                                                                                         Local
                                                                                                                                        source
                                                                                                                                         index




11    November 11th, 2010   ISWC 2010, Shanghai, China                                Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Push-based Symmetric Hash Join

        Operation                                                                          t7 t4
               Maintains a hash table for each                                             t7 t5
               input
               Arriving tuples are inserted into
               one hash table and then the other
               is probed for join combinations                                                  Push output
        Push-based
               Tuples are pushed into operators                 Left input                              Right input
               from the leaves to the root of the           Key         T                             Key           T
               query plan
                                                            a           t1 , t3                       b             t4 , t5
               Execution driven by incoming
               tuples instead of results                    b           t 2 t7
                                                                          2,                          c             t6
        Results reported as soon as input
        tuples arrive                                                    Insert
                                                                                                        Probe
        Tuples can arrive on all inputs in
                                                         Pushed on left: t7(b)
        any order
12    November 11th, 2010   ISWC 2010, Shanghai, China       Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
CORRECTIVE SOURCE
       RANKING

13   November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Corrective Source Ranking

        Prefer more relevant sources
        Relevancy of a source is based on
               Current query
               Any available intermediate results
               Overall optimization goal
        Define a set of source features and derive concrete
        source metrics
               Not all metrics are available for all sources (heterogeneity)
        Refine previously computed metrics using newly
        discovered information




14    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Source Features and Metrics

        Source is more relevant if it contains data that contributes
        to answers of the query
               Triple Pattern Cardinality

               Join Pattern Cardinality



        Cardinalities stored in local index
        Some patterns have high cardinality for all or many
        sources (e.g.             )
               These patterns do not discriminate sources



15    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Source Features and Metrics

        Adopt TF-IDF concept to obtain weights for triple patterns
               Importance positively correlates with how often bindings to a
               pattern occur in a source (i.e. cardinality)
               Importance negatively correlates with how often its bindings occur
               in all sources of the source collection S
        Triple Frequency – Inverse Source Frequency (TF-ISF)




16    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Source Features and Metrics - Links

        Source linked from many other sources is more relevant
        Relevance is higher when these links match query
        predicates
        Links are only discovered at run-time




17    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Metric Correction and Refinement

        During query processing new information becomes
        available: intermediate join results, links
               Refine and correctpreviously computed metrics
               Important in the case of non-discriminative patterns
        Instantiate triple pattern of a join with samples of
        intermediate results to obtain better join size estimates
        Example
                                                                            Perform triple pattern
     Intermediate results in SHJ operator                                    cardinality lookups




18    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Ranking at Run-time

        Optimization goal: early result reporting
               Indexed sources: triple and join pattern cardinality, TF-ISF,
               weighted links, sampled join size estimates
               Discovered sources: weighted links
        Ranking has to be refined at run-time
        Parameters influencing behavior and cost of ranking
        process
               Invalid Score Threshold: ranking is performed when the number
               of sources with invalid scores passes a threshold
               Sample Size: larger samples for join size estimation will give better
               estimates, are also more costly
               Resampling Threshold: cache join size estimates and perform
               sampling only when the hash table of join operator grows past a
               given threshold
19    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
EVALUATION


20   November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Evaluation

        Systems: top-down (TD), bottom-up (BU), mixed (MI)
        8 queries over various datasets (DBpedia, Geonames,
        NYT, Freebase, ...)
        To make the approaches comparable, sources were
        restricted to those discoverable by the BU approach
        ~6200 sources, containing ~500k triples
               Sources hosted on local proxy server with artificial delay of 2
               seconds
               25% of sources were randomly chosen to construct index for MI




21    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Results

        Overall early result reporting
           25% results: MI 8.7s, BU 15.1s
           50% results: MI 12.8s, BU 22.0s
           Improvement of ~42%
        Detailed results for two queries:

                                                   Query 1                                         Query 6
                                   BU                    MI        TD              BU                     MI                  TD
     25% Results               24810.5              10300.0       11038.0         8222.5                4743.5              5545.0
     50% Results               43464.5              40782.0       15787.0       10961.5                 7650.5              5634.0
     Total                     84066.5              86895.5       44323.5       24086.0               20711.0 16469.0
     Src.                               0.0               853.0    1444.5                0.0            1331.0              1863.5
     Selection
     Ranking                          25.5               2404.0     411.5              23.5               292.5               335.0
     #Sources                          622                 612       154                236                      92                 49
22    November 11th, 2010   ISWC 2010, Shanghai, China                  Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Result Arrival Times




23    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Ranking Heuristics




24    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
Conclusion

        Mixed strategy for Linked Data Query Processing
               Partial knowledge available beforehand, incorporated with source
               discovery at run-time
        Corrective Source Ranking
               Metrics for source relevancy
               Refinement of ranking at run-time
        Stream-based Query Processing
        Early results reported on average 42% faster
        Future work
               Adapt query plan to changing properties of incoming data
               Query local and remote data



25    November 11th, 2010   ISWC 2010, Shanghai, China   Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)

Mais conteúdo relacionado

Destaque

поляризация диэлектриков
поляризация диэлектриковполяризация диэлектриков
поляризация диэлектриковAndronovaAnna
 
Гастро-тур в Италию
Гастро-тур в ИталиюГастро-тур в Италию
Гастро-тур в ИталиюEasyWays
 
What's Next in Growth? 2016
What's Next in Growth? 2016What's Next in Growth? 2016
What's Next in Growth? 2016Andrew Chen
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsBarry Feldman
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome EconomyHelge Tennø
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your BusinessBarry Feldman
 

Destaque (6)

поляризация диэлектриков
поляризация диэлектриковполяризация диэлектриков
поляризация диэлектриков
 
Гастро-тур в Италию
Гастро-тур в ИталиюГастро-тур в Италию
Гастро-тур в Италию
 
What's Next in Growth? 2016
What's Next in Growth? 2016What's Next in Growth? 2016
What's Next in Growth? 2016
 
The Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post FormatsThe Six Highest Performing B2B Blog Post Formats
The Six Highest Performing B2B Blog Post Formats
 
The Outcome Economy
The Outcome EconomyThe Outcome Economy
The Outcome Economy
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
 

Semelhante a Linked Data Query Processing Strategies

Isni behind the scenes gatenby nadav manes harvard 201411
Isni behind the scenes gatenby nadav manes harvard 201411Isni behind the scenes gatenby nadav manes harvard 201411
Isni behind the scenes gatenby nadav manes harvard 201411Janifer Gatenby
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
 
Searching Linked Data
Searching Linked DataSearching Linked Data
Searching Linked DataThanh Tran
 
Best practices for Archival Processing of Research Objects (a librarian view)
Best practices for Archival Processing of Research Objects (a librarian view)Best practices for Archival Processing of Research Objects (a librarian view)
Best practices for Archival Processing of Research Objects (a librarian view)Oscar Corcho
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...Susanna-Assunta Sansone
 
SHARE Update for CNI, Fall 2014
SHARE Update for CNI, Fall 2014SHARE Update for CNI, Fall 2014
SHARE Update for CNI, Fall 2014SHARE
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseUniversity of Bologna
 
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prvJun Zhao
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinalDeborah McGuinness
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)Stian Soiland-Reyes
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)Stian Soiland-Reyes
 
Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...National Institute of Informatics
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexSusanna-Assunta Sansone
 
The Global reach of Crossref metadata
The Global reach of Crossref metadataThe Global reach of Crossref metadata
The Global reach of Crossref metadataCrossref
 
Invited talk @ DCC09 workshop
Invited talk @ DCC09 workshopInvited talk @ DCC09 workshop
Invited talk @ DCC09 workshopPaolo Missier
 

Semelhante a Linked Data Query Processing Strategies (20)

Isni behind the scenes gatenby nadav manes harvard 201411
Isni behind the scenes gatenby nadav manes harvard 201411Isni behind the scenes gatenby nadav manes harvard 201411
Isni behind the scenes gatenby nadav manes harvard 201411
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Searching Linked Data
Searching Linked DataSearching Linked Data
Searching Linked Data
 
Friday talk 11.02.2011
Friday talk 11.02.2011Friday talk 11.02.2011
Friday talk 11.02.2011
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Best practices for Archival Processing of Research Objects (a librarian view)
Best practices for Archival Processing of Research Objects (a librarian view)Best practices for Archival Processing of Research Objects (a librarian view)
Best practices for Archival Processing of Research Objects (a librarian view)
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
 
SHARE Update for CNI, Fall 2014
SHARE Update for CNI, Fall 2014SHARE Update for CNI, Fall 2014
SHARE Update for CNI, Fall 2014
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations arise
 
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prv
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal
 
Varnum Tracking Link Origins Working Group
Varnum Tracking Link Origins Working GroupVarnum Tracking Link Origins Working Group
Varnum Tracking Link Origins Working Group
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
 
Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery Index
 
The Global reach of Crossref metadata
The Global reach of Crossref metadataThe Global reach of Crossref metadata
The Global reach of Crossref metadata
 
MENGGUNAKAN METADATA PADA CROSSREF
MENGGUNAKAN METADATA PADA CROSSREFMENGGUNAKAN METADATA PADA CROSSREF
MENGGUNAKAN METADATA PADA CROSSREF
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
 
Invited talk @ DCC09 workshop
Invited talk @ DCC09 workshopInvited talk @ DCC09 workshop
Invited talk @ DCC09 workshop
 

Último

Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...lizamodels9
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionMintel Group
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCRashishs7044
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadAyesha Khan
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Timedelhimodelshub1
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...ictsugar
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 

Último (20)

Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
Call Girls In Radisson Blu Hotel New Delhi Paschim Vihar ❤️8860477959 Escorts...
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted Version
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Time
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 

Linked Data Query Processing Strategies

  • 1. Linked Data Query Processing Strategies Günter Ladwig, Thanh Tran International Semantic Web Conference 2010, Shanghai Institute of AppliedInformatics and Formal DescriptionMethods (AIFB) KIT – University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association www.kit.edu
  • 2. Contents Introduction Challenges Contributions Linked Data Query Processing Strategies Stream-based Query Processing Corrective Source Ranking Evaluation Conclusion 2 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 3. What is Linked Data? Linked Data Principles Use URIs to identify things Use HTTP URIs that allow dereferencing Dereferencing a URI provides information about the thing in a standard format (RDF) Include links to other, related URIs Linked Data Query Processing Evaluate queries directly over Linked Data Dereference Linked Data URIs during query processing 3 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 4. Challenges Volume of Source Collection Each URI is a potential data source Dynamic of Source Collection Sources may change rapidly over time Sources might only be discovered at run-time Heterogeneity of Sources, Source Descriptions and Access Methods Sources vary in size Description of sources vary in completeness Access methods: URI lookup, SPARQL endpoints, local cache, ... 4 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 5. Contributions Discussion of Linked Data Query Processing strategies Mixed strategy, combining local indexes and run-time discovery Stream-based Query Processing Data can arrive at any time and in any order Suited to deal with network latency Corrective Source Ranking Deals with different types of source descriptions Ranking is refined at run-time 5 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 6. LINKED DATA QUERY PROCESSING STRATEGIES 6 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 7. Top-down Query Evaluation SELECT ?paper ?author WHERE { ?paperswrc:author ?author . ?paperswc:isPartOf ?proc . ?proc swc:relatedToEvent<http://sw.org/eswc/2010>. Probe } Source URI Score Local Select and Retrieve sources http://sw.org/person/AB 0.87 source rank sources Join data index ... ... Local index, assumed to be complete Selection and ranking of sources No run-time discovery Fast, only relevant sources are retrieved Not up-to-date, index size may become very large 7 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 8. Bottom-up Query Evaluation SELECT ?paper ?author WHERE { ?paperswrc:author ?author . ?paperswc:isPartOf ?proc . ?proc swc:relatedToEvent<http://semweb.org/eswc/2010> . } Retrieve source <http://sw.org/proc/eswc/2010>swc:relatedToEvent <http://sw.org/eswc/2010> . ... Sources are discovered at Discover new sources run-time through links swc:paper1 swc:isPartOf <http://sw.org/proc/eswc/2010>. ... Answers can be incomplete as links might not be discoverable Slower, as unnecessary sources are retrieved Always up-to-date 8 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 9. Mixed Strategy Combination of top-down and bottom-up strategies Partial local index of sources, not assumed to be complete New sources are discovered at run-time Addresses volume and dynamic of Linked Data Corrective Source Ranking Deal with heterogeneous source descriptions Stream-based Query Processing Deal with unpredictable nature of Linked Data access 9 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 10. STREAM-BASED QUERY PROCESSING 10 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 11. Stream-based Query Processing Results Network latency Query Plan Join Do not block! Evaluation driven by Join name(?y, ?n) incoming data Compile-time worksAt(?x, dbpedia:KIT) knows(?x, ?y) Construct query plan Samples Probe local index for sources Push Run-time Source Retrieval Retrieve Source Ranker Rank sources source Source Retriever 1 Source 1 (score: 1.0) Retrieve sources Source Retriever 2 Source Source 2 (score: 0.7) discovered ... Push data into query plan ... Discover new sources Local source index 11 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 12. Push-based Symmetric Hash Join Operation t7 t4 Maintains a hash table for each t7 t5 input Arriving tuples are inserted into one hash table and then the other is probed for join combinations Push output Push-based Tuples are pushed into operators Left input Right input from the leaves to the root of the Key T Key T query plan a t1 , t3 b t4 , t5 Execution driven by incoming tuples instead of results b t 2 t7 2, c t6 Results reported as soon as input tuples arrive Insert Probe Tuples can arrive on all inputs in Pushed on left: t7(b) any order 12 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 13. CORRECTIVE SOURCE RANKING 13 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 14. Corrective Source Ranking Prefer more relevant sources Relevancy of a source is based on Current query Any available intermediate results Overall optimization goal Define a set of source features and derive concrete source metrics Not all metrics are available for all sources (heterogeneity) Refine previously computed metrics using newly discovered information 14 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 15. Source Features and Metrics Source is more relevant if it contains data that contributes to answers of the query Triple Pattern Cardinality Join Pattern Cardinality Cardinalities stored in local index Some patterns have high cardinality for all or many sources (e.g. ) These patterns do not discriminate sources 15 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 16. Source Features and Metrics Adopt TF-IDF concept to obtain weights for triple patterns Importance positively correlates with how often bindings to a pattern occur in a source (i.e. cardinality) Importance negatively correlates with how often its bindings occur in all sources of the source collection S Triple Frequency – Inverse Source Frequency (TF-ISF) 16 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 17. Source Features and Metrics - Links Source linked from many other sources is more relevant Relevance is higher when these links match query predicates Links are only discovered at run-time 17 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 18. Metric Correction and Refinement During query processing new information becomes available: intermediate join results, links Refine and correctpreviously computed metrics Important in the case of non-discriminative patterns Instantiate triple pattern of a join with samples of intermediate results to obtain better join size estimates Example Perform triple pattern Intermediate results in SHJ operator cardinality lookups 18 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 19. Ranking at Run-time Optimization goal: early result reporting Indexed sources: triple and join pattern cardinality, TF-ISF, weighted links, sampled join size estimates Discovered sources: weighted links Ranking has to be refined at run-time Parameters influencing behavior and cost of ranking process Invalid Score Threshold: ranking is performed when the number of sources with invalid scores passes a threshold Sample Size: larger samples for join size estimation will give better estimates, are also more costly Resampling Threshold: cache join size estimates and perform sampling only when the hash table of join operator grows past a given threshold 19 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 20. EVALUATION 20 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 21. Evaluation Systems: top-down (TD), bottom-up (BU), mixed (MI) 8 queries over various datasets (DBpedia, Geonames, NYT, Freebase, ...) To make the approaches comparable, sources were restricted to those discoverable by the BU approach ~6200 sources, containing ~500k triples Sources hosted on local proxy server with artificial delay of 2 seconds 25% of sources were randomly chosen to construct index for MI 21 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 22. Results Overall early result reporting 25% results: MI 8.7s, BU 15.1s 50% results: MI 12.8s, BU 22.0s Improvement of ~42% Detailed results for two queries: Query 1 Query 6 BU MI TD BU MI TD 25% Results 24810.5 10300.0 11038.0 8222.5 4743.5 5545.0 50% Results 43464.5 40782.0 15787.0 10961.5 7650.5 5634.0 Total 84066.5 86895.5 44323.5 24086.0 20711.0 16469.0 Src. 0.0 853.0 1444.5 0.0 1331.0 1863.5 Selection Ranking 25.5 2404.0 411.5 23.5 292.5 335.0 #Sources 622 612 154 236 92 49 22 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 23. Result Arrival Times 23 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 24. Ranking Heuristics 24 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)
  • 25. Conclusion Mixed strategy for Linked Data Query Processing Partial knowledge available beforehand, incorporated with source discovery at run-time Corrective Source Ranking Metrics for source relevancy Refinement of ranking at run-time Stream-based Query Processing Early results reported on average 42% faster Future work Adapt query plan to changing properties of incoming data Query local and remote data 25 November 11th, 2010 ISWC 2010, Shanghai, China Institute of AppliedInformatics and Formal DescriptionMethods (AIFB)