SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
DB vs RDF: structure vs correlation
   +comments on data integration & SW
                   Peter Boncz
        Senior Research Scientist @ CWI
     Lecturer @ Vrije Universiteit Amsterdam
        Architect & Co-founder MonetDB
       Architect & Co-founder VectorWise




            STI Summit, July 6, 2011 Riga. DB vs
                RDF: structure vs correlation      1
Correlations

    Real data is highly correlated
    }  (gender, location) ó firstname
    }  (profession,age) ó income


    But…

    database systems assume attribute independence
    }  wrong assumption leads to suboptimal query plan




 STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation   2
Example: co-authorship query
    DBLP
    “find authors who published in VLDB, SIGMOD and Bioinformatics”


    SELECT       author
    FROM publications p1, p2, p3
    WHERE        p1.venue = VLDB and
                 p2.venue = SIGMOD and
                 p3.venue = Bioinformatics and
                 p1.author = = p2.author and
                 p1.author p2.author and
                 p1.author = = p3.author and
                 p1.author p3.author and
                 p2.author = = p3.author
                 p2.author p3.author

 STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation   3
Correlations: RDBMS vs RDF

    Schematic structure in RDF data hidden in correlations,
      which are everywhere

    SPARQL leads to plans with
    }  many self-joins
    }  whose hit-ratio is correlated (with eg selections)


    Relational query optimizers do not have a clue
     è all self-joins look the same to it
     è random join order, bad query plans L

 STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation   4
RDF Engines to the Next Level

    Challenge: solving the correlation problem.

    Ideas?
    }  Interleave optimization and execution
          }    Run-time sampling to detect true selectivities
          }    Run-time query (re-)planning

    }    Tackling when there are long join chains
          }    Creating partial path indexes
          }    Graph “cracking”: index build as side-effect of query
          }    .

 STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation   5
Stratos Idreos: database cracking

    Challenge: solving the correlation problem.

    Ideas?
    }  Interleave optimization and execution
          }    Run-time sampling to detect true selectivities
          }    Run-time query (re-)planning

    }    Tackling when there are long join chains
          }    Creating partial path indexes
          }    Graph “cracking”: index build as side-effect of query
          }    .

 STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation   6
RDF Engines to the Next Level

    Challenge: solving the correlation problem.

    Ideas?
    }  Interleave optimization and execution
          }    Run-time sampling to detect true selectivities
          }    Run-time query (re-)planning

    }    Tackling when there are long join chains
          }    Creating partial path indexes
          }    Graph “cracking”: index build as side-effect of query
          }    “recycling” intermediate results

 STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation   7
RDF Engines to the Next Level
    Benchmarking in LOD2:
    }  Attempting to engage vendors to collaborate
        “a TPC for RDF”

    }    New and more challenging benchmarks
          “Suitably designed benchmarks drive progress”

          benchmarks with:
          }    complex query patterns on large data
          }    geo and text data and queries
          }    outside Linked Open Datasets that get joined to synthetic data
          }    highly interlinked graph structures and queries on them
          }    correlated query predicates

 STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation    8
Social Intelligence Benchmark (SIB)
    RDF-friendly benchmark simulating a huge social network
    }  Social Graphs have understandable, interesting, scenarios
    }  Social Graphs are highly connected
       }    LOD in the wild not yet

    + Exploiting knowledge bases from interlinked RDF datasets
      è not only synthetic data, also linking out to DBpedia

    conversation topics, real world concepts, geographical            information,
      connectivity, social network analysis

     Data correlations galore

 STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation    9
thoughts on.. Information Integration

    Data integration and Schema Integration
    }  Different applications, organizations, time motives
        hard problem, tens of B$/y

    }    Has been on the DB R&D menu for 20+ years
          }    AI complete, immature tech
          }    hard to achieve high precision automatically

    }    Semantic Web does not solve this issue
          }    In fact, it is its major hurdle to success
          }    Information integration != {Reasoning,Inference,Logic}

 STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation   10
The schema.org Approach

    choose
    }  web-addressable, machine-readable schema+data
    over
    }  ragged, graph, schema-last RDF data model


    Approach:
    }  schema-first, centralized, controlled
    }  well-defined use case (web search = ad money)




 STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation   11
The Watson Approach

    Winning Jeopardy! Is pretty cool

    No central role for reasoning, inference there

    Recipe:
    }  Finding statistical evidence in Big Data
    }  Using semantics for focused sub-tasks (only)
    }  Intelligently combining multiple approaches
    }  Focusing on the Jeopardy! problem at hand


 STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation   12
DBMS Architecture for new HW

    “hardware-conscious database architectures”
    }  focus on exploiting modern hardware
    }  platform independent, flexible
    }  highest performance per core
    }  green
    }  scalability to ~10 TB, currently




    examples:

 STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation   13

Mais conteúdo relacionado

Mais procurados

Neo4j GraphTour New York_Thomson Reuters SS
Neo4j GraphTour New York_Thomson Reuters SSNeo4j GraphTour New York_Thomson Reuters SS
Neo4j GraphTour New York_Thomson Reuters SSNeo4j
 
RDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRichard.Sapon-White
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF Navid Sedighpour
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
 
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfSparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfHarsh Thakkar
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
DataTables view CKAN monthly live
DataTables view   CKAN monthly liveDataTables view   CKAN monthly live
DataTables view CKAN monthly liveJoel Natividad
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsPeter Haase
 
Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...Dippy Aggarwal
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureMichele Pasin
 
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingAnalytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingOntotext
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
 

Mais procurados (20)

Graham Cousins
Graham CousinsGraham Cousins
Graham Cousins
 
Neo4j GraphTour New York_Thomson Reuters SS
Neo4j GraphTour New York_Thomson Reuters SSNeo4j GraphTour New York_Thomson Reuters SS
Neo4j GraphTour New York_Thomson Reuters SS
 
RDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna PszenicynRDF and the Semantic Web -- Joanna Pszenicyn
RDF and the Semantic Web -- Joanna Pszenicyn
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF
 
LODAC Museum -- Connecting Museums with LOD --
LODAC Museum -- Connecting Museums with LOD --LODAC Museum -- Connecting Museums with LOD --
LODAC Museum -- Connecting Museums with LOD --
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
 
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sfSparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Semantic web an overview and projects
Semantic web   an  overview and projectsSemantic web   an  overview and projects
Semantic web an overview and projects
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
DataTables view CKAN monthly live
DataTables view   CKAN monthly liveDataTables view   CKAN monthly live
DataTables view CKAN monthly live
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingAnalytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
 
TinkerPop 2020
TinkerPop 2020TinkerPop 2020
TinkerPop 2020
 
Towards a Query-by-Example System for Knowledge Graphs
Towards a Query-by-Example System for Knowledge GraphsTowards a Query-by-Example System for Knowledge Graphs
Towards a Query-by-Example System for Knowledge Graphs
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
 

Destaque

Guia de estudios para la materia de teleprocesos 15 b
Guia de estudios para la materia de teleprocesos 15 bGuia de estudios para la materia de teleprocesos 15 b
Guia de estudios para la materia de teleprocesos 15 bmanuelcucosta
 
Casarte sin-gafas-y-sin-lentillas
Casarte sin-gafas-y-sin-lentillasCasarte sin-gafas-y-sin-lentillas
Casarte sin-gafas-y-sin-lentillasDiego Tondonia
 
CEONET Introduction & Overview
CEONET Introduction & OverviewCEONET Introduction & Overview
CEONET Introduction & OverviewPoonam Sagar
 
Communiqué de presse Manuel Valls
Communiqué de presse Manuel VallsCommuniqué de presse Manuel Valls
Communiqué de presse Manuel Vallstdeszpot
 
4 cchladlcvmt06
4 cchladlcvmt064 cchladlcvmt06
4 cchladlcvmt06Geraa Ufms
 
Tibb m3 u4-reporte_recursosweb2.0_personal_actividadopcional
Tibb m3 u4-reporte_recursosweb2.0_personal_actividadopcionalTibb m3 u4-reporte_recursosweb2.0_personal_actividadopcional
Tibb m3 u4-reporte_recursosweb2.0_personal_actividadopcionalBlanca Tirado
 
Amazon Repricing More sales,More Buy Box, More Time - XSellco
Amazon Repricing  More sales,More Buy Box, More Time - XSellcoAmazon Repricing  More sales,More Buy Box, More Time - XSellco
Amazon Repricing More sales,More Buy Box, More Time - XSellcoDaytodayebay
 
BUENOS AIRES BESUCH
BUENOS AIRES BESUCHBUENOS AIRES BESUCH
BUENOS AIRES BESUCHmirtangel
 
Tutorial de Twitter para Principiantes
Tutorial de Twitter para PrincipiantesTutorial de Twitter para Principiantes
Tutorial de Twitter para PrincipiantesJavier Cejudo
 
Shephard/Phillips TMAU webinar 09/12
Shephard/Phillips TMAU webinar 09/12Shephard/Phillips TMAU webinar 09/12
Shephard/Phillips TMAU webinar 09/12meboresearch
 
Act 5.3 teoía valor colaborativa Equipo 1
Act 5.3 teoía valor colaborativa  Equipo 1Act 5.3 teoía valor colaborativa  Equipo 1
Act 5.3 teoía valor colaborativa Equipo 1Ulrich Von Lichtenstien
 
Lugares bíblicos para visitar
Lugares bíblicos para visitarLugares bíblicos para visitar
Lugares bíblicos para visitarYeshiva Torah
 

Destaque (20)

Guia de estudios para la materia de teleprocesos 15 b
Guia de estudios para la materia de teleprocesos 15 bGuia de estudios para la materia de teleprocesos 15 b
Guia de estudios para la materia de teleprocesos 15 b
 
Linfen
Linfen Linfen
Linfen
 
Casarte sin-gafas-y-sin-lentillas
Casarte sin-gafas-y-sin-lentillasCasarte sin-gafas-y-sin-lentillas
Casarte sin-gafas-y-sin-lentillas
 
Caixa Econômica Federal
Caixa Econômica  FederalCaixa Econômica  Federal
Caixa Econômica Federal
 
CEONET Introduction & Overview
CEONET Introduction & OverviewCEONET Introduction & Overview
CEONET Introduction & Overview
 
PM Kölle putzmunter.pdf
PM Kölle putzmunter.pdfPM Kölle putzmunter.pdf
PM Kölle putzmunter.pdf
 
60 0717 no temáis
60 0717 no temáis60 0717 no temáis
60 0717 no temáis
 
Communiqué de presse Manuel Valls
Communiqué de presse Manuel VallsCommuniqué de presse Manuel Valls
Communiqué de presse Manuel Valls
 
Trabaja en la red inmobiliaria mariatomasa.com
Trabaja en la red inmobiliaria mariatomasa.comTrabaja en la red inmobiliaria mariatomasa.com
Trabaja en la red inmobiliaria mariatomasa.com
 
4 cchladlcvmt06
4 cchladlcvmt064 cchladlcvmt06
4 cchladlcvmt06
 
Tibb m3 u4-reporte_recursosweb2.0_personal_actividadopcional
Tibb m3 u4-reporte_recursosweb2.0_personal_actividadopcionalTibb m3 u4-reporte_recursosweb2.0_personal_actividadopcional
Tibb m3 u4-reporte_recursosweb2.0_personal_actividadopcional
 
Amazon Repricing More sales,More Buy Box, More Time - XSellco
Amazon Repricing  More sales,More Buy Box, More Time - XSellcoAmazon Repricing  More sales,More Buy Box, More Time - XSellco
Amazon Repricing More sales,More Buy Box, More Time - XSellco
 
BUENOS AIRES BESUCH
BUENOS AIRES BESUCHBUENOS AIRES BESUCH
BUENOS AIRES BESUCH
 
El genio de la lampara
El genio de la lamparaEl genio de la lampara
El genio de la lampara
 
Conoce Jerez
Conoce JerezConoce Jerez
Conoce Jerez
 
Duhalde
DuhaldeDuhalde
Duhalde
 
Tutorial de Twitter para Principiantes
Tutorial de Twitter para PrincipiantesTutorial de Twitter para Principiantes
Tutorial de Twitter para Principiantes
 
Shephard/Phillips TMAU webinar 09/12
Shephard/Phillips TMAU webinar 09/12Shephard/Phillips TMAU webinar 09/12
Shephard/Phillips TMAU webinar 09/12
 
Act 5.3 teoía valor colaborativa Equipo 1
Act 5.3 teoía valor colaborativa  Equipo 1Act 5.3 teoía valor colaborativa  Equipo 1
Act 5.3 teoía valor colaborativa Equipo 1
 
Lugares bíblicos para visitar
Lugares bíblicos para visitarLugares bíblicos para visitar
Lugares bíblicos para visitar
 

Semelhante a STI Summit 2011 - DB vs RDF

OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4Jijcsity
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1ErhardRahm
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so farElena Simperl
 
Bridging the gap between the semantic web and big data: answering SPARQL que...
Bridging the gap between the semantic web and big data:  answering SPARQL que...Bridging the gap between the semantic web and big data:  answering SPARQL que...
Bridging the gap between the semantic web and big data: answering SPARQL que...IJECEIAES
 
Pragmatic Approaches to the Semantic Web
Pragmatic Approaches to the Semantic WebPragmatic Approaches to the Semantic Web
Pragmatic Approaches to the Semantic WebMike Bergman
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databasesijdms
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityunivTope Omitola
 
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...Holistic Benchmarking of Big Linked Data
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Synaptica, LLC
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016Jessie Chuang
 
Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011jexp
 
What Factors Influence the Design of a Linked Data Generation Algorithm?
What Factors Influence the Design of a Linked Data Generation Algorithm?What Factors Influence the Design of a Linked Data Generation Algorithm?
What Factors Influence the Design of a Linked Data Generation Algorithm?andimou
 
Analysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAnalysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAbhishek Mungoli
 
A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB
A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB
A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB ijfcstjournal
 

Semelhante a STI Summit 2011 - DB vs RDF (20)

OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
 
RDF data clustering
RDF data clusteringRDF data clustering
RDF data clustering
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
Bridging the gap between the semantic web and big data: answering SPARQL que...
Bridging the gap between the semantic web and big data:  answering SPARQL que...Bridging the gap between the semantic web and big data:  answering SPARQL que...
Bridging the gap between the semantic web and big data: answering SPARQL que...
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Pragmatic Approaches to the Semantic Web
Pragmatic Approaches to the Semantic WebPragmatic Approaches to the Semantic Web
Pragmatic Approaches to the Semantic Web
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databases
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
An RDF Dataset Generator for the Social Network Benchmark with Real-World Coh...
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011
 
What Factors Influence the Design of a Linked Data Generation Algorithm?
What Factors Influence the Design of a Linked Data Generation Algorithm?What Factors Influence the Design of a Linked Data Generation Algorithm?
What Factors Influence the Design of a Linked Data Generation Algorithm?
 
Analysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAnalysis of different similarity measures: Simrank
Analysis of different similarity measures: Simrank
 
A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB
A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB
A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB
 

Mais de Semantic Technology Institute International

Mais de Semantic Technology Institute International (20)

Summit2013 sw in russian universities
Summit2013   sw in russian universitiesSummit2013   sw in russian universities
Summit2013 sw in russian universities
 
Summit2013 semantic web in russia
Summit2013   semantic web in russiaSummit2013   semantic web in russia
Summit2013 semantic web in russia
 
Summit2013 john domingue - introduction
Summit2013   john domingue - introductionSummit2013   john domingue - introduction
Summit2013 john domingue - introduction
 
Summit2013 john domingue - horizon2020
Summit2013   john domingue - horizon2020Summit2013   john domingue - horizon2020
Summit2013 john domingue - horizon2020
 
Summit2013 ho-jin choi - summit2013
Summit2013   ho-jin choi - summit2013Summit2013   ho-jin choi - summit2013
Summit2013 ho-jin choi - summit2013
 
Summit2013 georg gottlob and tim furche - diadem
Summit2013   georg gottlob and tim furche - diademSummit2013   georg gottlob and tim furche - diadem
Summit2013 georg gottlob and tim furche - diadem
 
Summit2013 eventos onto quad
Summit2013   eventos onto quadSummit2013   eventos onto quad
Summit2013 eventos onto quad
 
Summit2013 choi - wise kb-introd
Summit2013   choi - wise kb-introdSummit2013   choi - wise kb-introd
Summit2013 choi - wise kb-introd
 
Summit2013 choi - kaist-cs-intro
Summit2013   choi - kaist-cs-introSummit2013   choi - kaist-cs-intro
Summit2013 choi - kaist-cs-intro
 
STI Summit 2011 - Conclusion
STI Summit 2011 - ConclusionSTI Summit 2011 - Conclusion
STI Summit 2011 - Conclusion
 
STI Summit 2011 - Dynamic web
STI Summit 2011 - Dynamic webSTI Summit 2011 - Dynamic web
STI Summit 2011 - Dynamic web
 
STI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-smSTI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-sm
 
STI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streamsSTI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streams
 
STI Summit 2011 - Linked services
STI Summit 2011 - Linked servicesSTI Summit 2011 - Linked services
STI Summit 2011 - Linked services
 
STI Summit 2011 - di@scale
STI Summit 2011 - di@scaleSTI Summit 2011 - di@scale
STI Summit 2011 - di@scale
 
STI Summit 2011 - A personal look at the future of Semantic Technologies
STI Summit 2011 - A personal look at the future of Semantic TechnologiesSTI Summit 2011 - A personal look at the future of Semantic Technologies
STI Summit 2011 - A personal look at the future of Semantic Technologies
 
STI Summit 2011 - Visual analytics and linked data
STI Summit 2011 - Visual analytics and linked dataSTI Summit 2011 - Visual analytics and linked data
STI Summit 2011 - Visual analytics and linked data
 
STI Summit 2011 - LS4 LS Khaos
STI Summit 2011 - LS4 LS KhaosSTI Summit 2011 - LS4 LS Khaos
STI Summit 2011 - LS4 LS Khaos
 
STI Summit 2011 - Making linked data work
STI Summit 2011 - Making linked data workSTI Summit 2011 - Making linked data work
STI Summit 2011 - Making linked data work
 
STI Summit 2011 - Shortipedia
STI Summit 2011 - ShortipediaSTI Summit 2011 - Shortipedia
STI Summit 2011 - Shortipedia
 

Último

How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxKatherine Villaluna
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice documentXsasf Sfdfasd
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...Nguyen Thanh Tu Collection
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxDr. Santhosh Kumar. N
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...raviapr7
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational PhilosophyShuvankar Madhu
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxEduSkills OECD
 

Último (20)

How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice document
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptx
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational Philosophy
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
 

STI Summit 2011 - DB vs RDF

  • 1. DB vs RDF: structure vs correlation +comments on data integration & SW Peter Boncz Senior Research Scientist @ CWI Lecturer @ Vrije Universiteit Amsterdam Architect & Co-founder MonetDB Architect & Co-founder VectorWise STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 1
  • 2. Correlations Real data is highly correlated }  (gender, location) ó firstname }  (profession,age) ó income But… database systems assume attribute independence }  wrong assumption leads to suboptimal query plan STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 2
  • 3. Example: co-authorship query DBLP “find authors who published in VLDB, SIGMOD and Bioinformatics” SELECT author FROM publications p1, p2, p3 WHERE p1.venue = VLDB and p2.venue = SIGMOD and p3.venue = Bioinformatics and p1.author = = p2.author and p1.author p2.author and p1.author = = p3.author and p1.author p3.author and p2.author = = p3.author p2.author p3.author STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 3
  • 4. Correlations: RDBMS vs RDF Schematic structure in RDF data hidden in correlations, which are everywhere SPARQL leads to plans with }  many self-joins }  whose hit-ratio is correlated (with eg selections) Relational query optimizers do not have a clue è all self-joins look the same to it è random join order, bad query plans L STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 4
  • 5. RDF Engines to the Next Level Challenge: solving the correlation problem. Ideas? }  Interleave optimization and execution }  Run-time sampling to detect true selectivities }  Run-time query (re-)planning }  Tackling when there are long join chains }  Creating partial path indexes }  Graph “cracking”: index build as side-effect of query }  . STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 5
  • 6. Stratos Idreos: database cracking Challenge: solving the correlation problem. Ideas? }  Interleave optimization and execution }  Run-time sampling to detect true selectivities }  Run-time query (re-)planning }  Tackling when there are long join chains }  Creating partial path indexes }  Graph “cracking”: index build as side-effect of query }  . STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 6
  • 7. RDF Engines to the Next Level Challenge: solving the correlation problem. Ideas? }  Interleave optimization and execution }  Run-time sampling to detect true selectivities }  Run-time query (re-)planning }  Tackling when there are long join chains }  Creating partial path indexes }  Graph “cracking”: index build as side-effect of query }  “recycling” intermediate results STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 7
  • 8. RDF Engines to the Next Level Benchmarking in LOD2: }  Attempting to engage vendors to collaborate “a TPC for RDF” }  New and more challenging benchmarks “Suitably designed benchmarks drive progress” benchmarks with: }  complex query patterns on large data }  geo and text data and queries }  outside Linked Open Datasets that get joined to synthetic data }  highly interlinked graph structures and queries on them }  correlated query predicates STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 8
  • 9. Social Intelligence Benchmark (SIB) RDF-friendly benchmark simulating a huge social network }  Social Graphs have understandable, interesting, scenarios }  Social Graphs are highly connected }  LOD in the wild not yet + Exploiting knowledge bases from interlinked RDF datasets è not only synthetic data, also linking out to DBpedia conversation topics, real world concepts, geographical information, connectivity, social network analysis Data correlations galore STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 9
  • 10. thoughts on.. Information Integration Data integration and Schema Integration }  Different applications, organizations, time motives hard problem, tens of B$/y }  Has been on the DB R&D menu for 20+ years }  AI complete, immature tech }  hard to achieve high precision automatically }  Semantic Web does not solve this issue }  In fact, it is its major hurdle to success }  Information integration != {Reasoning,Inference,Logic} STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 10
  • 11. The schema.org Approach choose }  web-addressable, machine-readable schema+data over }  ragged, graph, schema-last RDF data model Approach: }  schema-first, centralized, controlled }  well-defined use case (web search = ad money) STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 11
  • 12. The Watson Approach Winning Jeopardy! Is pretty cool No central role for reasoning, inference there Recipe: }  Finding statistical evidence in Big Data }  Using semantics for focused sub-tasks (only) }  Intelligently combining multiple approaches }  Focusing on the Jeopardy! problem at hand STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 12
  • 13. DBMS Architecture for new HW “hardware-conscious database architectures” }  focus on exploiting modern hardware }  platform independent, flexible }  highest performance per core }  green }  scalability to ~10 TB, currently examples: STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 13