SlideShare a Scribd company logo
1 of 16
Download to read offline
Digital Worlds (applications)
 q    VEC (Enterprise Scale)
        •  1,300 source databases
        •  10+ million views (via data integration)


 q    US Healthcare (National Scale)
        •  Scale
              o  Health care and social assistance offices: 784,626 incl
                    •    Doctors offices: 220,131
                    •    Dentists: 127,057
                    •    Hospitals: 6,505
                    •    Clinics: ~5,000 ~= SME say 100 Databases
              o  Patients: 100-300+ million
             o  Databases: ~32 million
        •  Scope
              o  Comprehensive medical events, methods, analysis, …
                    •  E.g., Alice (62) in Emergency Room with liver failure
              o  Insurance, payments, …
             o  New metric: healthcare quality
        •  Examples
              o  SHRINE (2009): 3 hospitals; uses 2,381,883 distinct concepts (ontologies)
              o  HHS CIO (Todd Park): Open Health Data Initiative
              o  US (PCAST, White House) vision
Observations
 q  Data    Sources
     •  Massive
          o  Number
          o  Heterogeneity
          o  Distribution (data at source)
          o  Constant change – data, model, ontology, business rules, …
     •  Constrained
          o  Governance: privacy, confidentiality, legal, …
          o  Quality, correctness, precision, …
          o  Competition


 q  Critical   Requirement: meaningful
     •    Human lives
     •    Health of individuals, communities, nation
     •    Economic impact: $ trillions / year
     •    Political: meaningless debates
Trends
q  Digital    Universe

q  Holistic     Views
    •  Information Ecosystems: data
    •  Ecosystems: Processes over services


q  Big   Data: massive
           o  Number
           o  Distribution
           o  Heterogeneity
                •  Semantics
                •  Structure: relational databases, X databases, web, deep web
                •  Technology: databases, data warehouses, files, …


q  New     Models: problem solving, data, …
    •    Data-driven
    •    Social computing: data as social artifacts
    •    Science: Wolfram Alpha
    •    Pragmatics: Driven by healthcare quality improvement
Databases and AI: The Twain Just Met
 q  Database     World
     •  Engineering (RDBMSs) @ scale
     •  Reasoning: Relational model (FoL)


 q  AI   World
     •  Reasoning: more powerful & expressive
     •  Engineering: in the small


 q  Digital   Universe, e.g., Web
     •  Reasoning: beyond the RDM & AI?
     •  Engineering: way beyond RDBMS


 q  Information    ecosystems
     •  Databases: join
     •  Web: link
                              Power Law of Data
 The value of a data element is proportional to the number of its meaningful uses.
What Underlies the Digital Universe

    Modelling           Execution


  Data Models         DBMS Engines


   Languages            Algorithms



   Semantics            Semantics



 Problem Solving       Computation
What Underlies the Data Universe




  Relational
               Data Independence    RDBMS
  Data Model




  Semantics                        Semantics
Problem Solving                Computation
Relational Database Improvements
 q  Pre-Relational
    •  Hierarchical
    •  Network
 q  Relational
    •  Row store
    •  OLAP / Data Warehouse
 q  Post-Relational
    •    RDF store
    •    Column store
    •    Bare bones relational
    •    Stream / complex event processing
 q  Push   Down
    •  Database / data warehouse appliances (20+ on the market)
    •  In-database analytics, … (10+ on the market)
Data Models For New Domains Must Honor
Data Independence
 q    Array (Matrix)-store (SciDB) [Linear algebra]

 q    XML databases: structured content, information exchange

 q    Content management: e.g., Sharepoint

 q    Graph/network store: social networking (Facebook), link analysis

 q    Protein store: protein folding, drug discovery, …

 q    Geospatial / map store: location-based applications

 q    Time series: signal processing, statistical and financial analysis

 q    Cloud / Mesh data (NoSQL) stores: web scale applications

 q    and they just keep coming …
Data Universe



                Database Universe



                    Relational
                      Data
                    Universe
Data Universe   Graph-
                Network              Time
                 Data                Series
  Scientific     Model                Data
    Data                             Model
   Model
                   DBU
                                        Geo-
                                       Spatial
                  RDM                   Data
                                       Model


    Document
      Data
                          Digital
     Model                Media          ETC.
                          Data        ETC.
                                    ETC.
                          Model
Data Universe   Graph-
                Network              Time
                 Data                Series
  Scientific     Model                Data
    Data                             Model
   Model
                   DBU
                                        Geo-
                                       Spatial
                  RDM                   Data
                                       Model


    Document
      Data
                          Digital
     Model                Media          ETC.
                          Data        ETC.
                                    ETC.
                          Model
Data Integration Solution Space:
Data Independence Required
                                       Computation           Problem Solving

  Databases
                       Relational   Optimal 4 homogeneous     Optimal 4 pure
                                         relational data      relational data
                  Domain-specific        Emerging              Emerging

  Semantic Technologies (AI)
         Knowledge Representation         Minimal               Powerful
                       Ontologies         Minimal               Powerful
                   Semantic Web     Modest / emerging       Modest / emerging
    Semantic Data Management             Emerging               Emerging
  Architectural
        Information-As-A-Service         Emerging              Emerging
                           Cloud         Emerging                  N/A
Databases vs. Semantic Web
    Discrete Worlds         Heterogeneous Worlds

Single Versions of Truth       Multiple Truths

     Data Models                LOD Models?

  Mathematical Logic               What Logic ?
             1,000s of databases
Probabilistic / Eventual       Common Sense
     Reasoning                  Reasoning?

  DI: Relational Join      DI: Evidence Gathering


    Databases               Semantic Web
Databases vs. Web
                                                                                Web	
  




                                                                                                      Explora2on	
  
             Mul2ple	
  versions	
  of	
  truth	
  

                                                                                .	
  .	
  .	
  	
  




                                                                                                      Analysis	
  /	
  BI	
   Evidence	
  Gathering
                                                                        Data	
  Warehouses	
  
Scale




                                                                                .	
  .	
  .	
  	
  
                                                         Seman+cally	
  Heterogeneous	
  Views	
  
    Single	
  versions	
  




                                                                        Data	
  Management	
  
       of	
  truth	
  




                                                                            .	
  .	
  .	
  	
  
                                                      Seman+cally	
  Homogeneous	
  Databases	
  
Data Integration
 q  Query:   define the result
    •  Entity
    •  Computation


 q  Find candidate data sets: search            Hard
 q  Extract, Transform, and Load (ETL): engineering
 q  Data Integration
    •  Entity resolution          Harder
    •  Integration computation
Managing Data @ Scale I
 q  Introduction
    •  Michael L. Brodie


 q  Global   Data Integration and Global Data Mining
    •  Chris Bizer


 q  DB   vs RDF: structure vs correlation
    •  Peter Boncz

More Related Content

What's hot

Data mining-2
Data mining-2Data mining-2
Data mining-2Nit Hik
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Data Mining on Twitter
Data Mining on TwitterData Mining on Twitter
Data Mining on TwitterPulkit Goyal
 
data mining
data miningdata mining
data mininguoitc
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an IntroductionAli Abbasi
 
Data Mining and Knowledge Discovery in Large Databases
Data Mining and Knowledge Discovery in Large DatabasesData Mining and Knowledge Discovery in Large Databases
Data Mining and Knowledge Discovery in Large DatabasesSSA KPI
 
Semantic Interoperability & Information Brokering in Global Information Systems
Semantic Interoperability & Information Brokering in Global Information SystemsSemantic Interoperability & Information Brokering in Global Information Systems
Semantic Interoperability & Information Brokering in Global Information SystemsAmit Sheth
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : ConceptsPragya Pandey
 
There's no such thing as big data
There's no such thing as big dataThere's no such thing as big data
There's no such thing as big dataAndrew Clegg
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1DanWooster1
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryHoang Nguyen
 

What's hot (20)

Data mining-2
Data mining-2Data mining-2
Data mining-2
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Data Mining on Twitter
Data Mining on TwitterData Mining on Twitter
Data Mining on Twitter
 
Data Mining
Data MiningData Mining
Data Mining
 
data mining
data miningdata mining
data mining
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Data mining
Data miningData mining
Data mining
 
Data mining 1
Data mining 1Data mining 1
Data mining 1
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Data Mining and Knowledge Discovery in Large Databases
Data Mining and Knowledge Discovery in Large DatabasesData Mining and Knowledge Discovery in Large Databases
Data Mining and Knowledge Discovery in Large Databases
 
Semantic Interoperability & Information Brokering in Global Information Systems
Semantic Interoperability & Information Brokering in Global Information SystemsSemantic Interoperability & Information Brokering in Global Information Systems
Semantic Interoperability & Information Brokering in Global Information Systems
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
 
There's no such thing as big data
There's no such thing as big dataThere's no such thing as big data
There's no such thing as big data
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 

Viewers also liked (9)

STI2 Research 2012
STI2 Research 2012STI2 Research 2012
STI2 Research 2012
 
STI Summit 2011 - di@scale
STI Summit 2011 - di@scaleSTI Summit 2011 - di@scale
STI Summit 2011 - di@scale
 
STI Summit 2011 - PlanetData
STI Summit 2011 - PlanetDataSTI Summit 2011 - PlanetData
STI Summit 2011 - PlanetData
 
STI Summit 2011 - Linked Data & Ontologies
STI Summit 2011 - Linked Data & OntologiesSTI Summit 2011 - Linked Data & Ontologies
STI Summit 2011 - Linked Data & Ontologies
 
STI2 Board Meeting 2011 - Financials
STI2 Board Meeting 2011 - FinancialsSTI2 Board Meeting 2011 - Financials
STI2 Board Meeting 2011 - Financials
 
STI International Marketing Presentation
STI International Marketing PresentationSTI International Marketing Presentation
STI International Marketing Presentation
 
STI Summit 2011 - Shortipedia
STI Summit 2011 - ShortipediaSTI Summit 2011 - Shortipedia
STI Summit 2011 - Shortipedia
 
STI Summit 2011 - Diversity
STI Summit 2011 - DiversitySTI Summit 2011 - Diversity
STI Summit 2011 - Diversity
 
STI2 Organisation 2012
STI2 Organisation 2012STI2 Organisation 2012
STI2 Organisation 2012
 

Similar to STI Summit 2011 - Digital Worlds

Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?andrea huang
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big DataShankar R
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.pptadmsoyadm4
 
Small, Medium and Big Data
Small, Medium and Big DataSmall, Medium and Big Data
Small, Medium and Big DataPierre De Wilde
 
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in EnterprisesPragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in EnterprisesAmit Sheth
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.docbutest
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesasnaparveen414
 

Similar to STI Summit 2011 - Digital Worlds (20)

Data mining
Data miningData mining
Data mining
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Small, Medium and Big Data
Small, Medium and Big DataSmall, Medium and Big Data
Small, Medium and Big Data
 
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in EnterprisesPragmatics Driven Issues in Data and Process Integrity in Enterprises
Pragmatics Driven Issues in Data and Process Integrity in Enterprises
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.doc
 
Role of Semantic Web in Health Informatics
Role of Semantic Web in Health InformaticsRole of Semantic Web in Health Informatics
Role of Semantic Web in Health Informatics
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notes
 

More from Semantic Technology Institute International

More from Semantic Technology Institute International (20)

Summit2013 sw in russian universities
Summit2013   sw in russian universitiesSummit2013   sw in russian universities
Summit2013 sw in russian universities
 
Summit2013 semantic web in russia
Summit2013   semantic web in russiaSummit2013   semantic web in russia
Summit2013 semantic web in russia
 
Summit2013 john domingue - introduction
Summit2013   john domingue - introductionSummit2013   john domingue - introduction
Summit2013 john domingue - introduction
 
Summit2013 john domingue - horizon2020
Summit2013   john domingue - horizon2020Summit2013   john domingue - horizon2020
Summit2013 john domingue - horizon2020
 
Summit2013 ho-jin choi - summit2013
Summit2013   ho-jin choi - summit2013Summit2013   ho-jin choi - summit2013
Summit2013 ho-jin choi - summit2013
 
Summit2013 georg gottlob and tim furche - diadem
Summit2013   georg gottlob and tim furche - diademSummit2013   georg gottlob and tim furche - diadem
Summit2013 georg gottlob and tim furche - diadem
 
Summit2013 eventos onto quad
Summit2013   eventos onto quadSummit2013   eventos onto quad
Summit2013 eventos onto quad
 
Summit2013 choi - wise kb-introd
Summit2013   choi - wise kb-introdSummit2013   choi - wise kb-introd
Summit2013 choi - wise kb-introd
 
Summit2013 choi - kaist-cs-intro
Summit2013   choi - kaist-cs-introSummit2013   choi - kaist-cs-intro
Summit2013 choi - kaist-cs-intro
 
STI Summit 2011 - Conclusion
STI Summit 2011 - ConclusionSTI Summit 2011 - Conclusion
STI Summit 2011 - Conclusion
 
STI Summit 2011 - Dynamic web
STI Summit 2011 - Dynamic webSTI Summit 2011 - Dynamic web
STI Summit 2011 - Dynamic web
 
STI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-smSTI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-sm
 
STI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streamsSTI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streams
 
STI Summit 2011 - Linked services
STI Summit 2011 - Linked servicesSTI Summit 2011 - Linked services
STI Summit 2011 - Linked services
 
STI Summit 2011 - A personal look at the future of Semantic Technologies
STI Summit 2011 - A personal look at the future of Semantic TechnologiesSTI Summit 2011 - A personal look at the future of Semantic Technologies
STI Summit 2011 - A personal look at the future of Semantic Technologies
 
STI Summit 2011 - Visual analytics and linked data
STI Summit 2011 - Visual analytics and linked dataSTI Summit 2011 - Visual analytics and linked data
STI Summit 2011 - Visual analytics and linked data
 
STI Summit 2011 - LS4 LS Khaos
STI Summit 2011 - LS4 LS KhaosSTI Summit 2011 - LS4 LS Khaos
STI Summit 2011 - LS4 LS Khaos
 
STI Summit 2011 - Making linked data work
STI Summit 2011 - Making linked data workSTI Summit 2011 - Making linked data work
STI Summit 2011 - Making linked data work
 
STI Summit 2011 - Beyond privacy
STI Summit 2011 - Beyond privacySTI Summit 2011 - Beyond privacy
STI Summit 2011 - Beyond privacy
 
STI Summit 2011 - Social semantics
STI Summit 2011 - Social semanticsSTI Summit 2011 - Social semantics
STI Summit 2011 - Social semantics
 

STI Summit 2011 - Digital Worlds

  • 1. Digital Worlds (applications) q  VEC (Enterprise Scale) •  1,300 source databases •  10+ million views (via data integration) q  US Healthcare (National Scale) •  Scale o  Health care and social assistance offices: 784,626 incl •  Doctors offices: 220,131 •  Dentists: 127,057 •  Hospitals: 6,505 •  Clinics: ~5,000 ~= SME say 100 Databases o  Patients: 100-300+ million o  Databases: ~32 million •  Scope o  Comprehensive medical events, methods, analysis, … •  E.g., Alice (62) in Emergency Room with liver failure o  Insurance, payments, … o  New metric: healthcare quality •  Examples o  SHRINE (2009): 3 hospitals; uses 2,381,883 distinct concepts (ontologies) o  HHS CIO (Todd Park): Open Health Data Initiative o  US (PCAST, White House) vision
  • 2. Observations q  Data Sources •  Massive o  Number o  Heterogeneity o  Distribution (data at source) o  Constant change – data, model, ontology, business rules, … •  Constrained o  Governance: privacy, confidentiality, legal, … o  Quality, correctness, precision, … o  Competition q  Critical Requirement: meaningful •  Human lives •  Health of individuals, communities, nation •  Economic impact: $ trillions / year •  Political: meaningless debates
  • 3. Trends q  Digital Universe q  Holistic Views •  Information Ecosystems: data •  Ecosystems: Processes over services q  Big Data: massive o  Number o  Distribution o  Heterogeneity •  Semantics •  Structure: relational databases, X databases, web, deep web •  Technology: databases, data warehouses, files, … q  New Models: problem solving, data, … •  Data-driven •  Social computing: data as social artifacts •  Science: Wolfram Alpha •  Pragmatics: Driven by healthcare quality improvement
  • 4. Databases and AI: The Twain Just Met q  Database World •  Engineering (RDBMSs) @ scale •  Reasoning: Relational model (FoL) q  AI World •  Reasoning: more powerful & expressive •  Engineering: in the small q  Digital Universe, e.g., Web •  Reasoning: beyond the RDM & AI? •  Engineering: way beyond RDBMS q  Information ecosystems •  Databases: join •  Web: link Power Law of Data The value of a data element is proportional to the number of its meaningful uses.
  • 5. What Underlies the Digital Universe Modelling Execution Data Models DBMS Engines Languages Algorithms Semantics Semantics Problem Solving Computation
  • 6. What Underlies the Data Universe Relational Data Independence RDBMS Data Model Semantics Semantics Problem Solving Computation
  • 7. Relational Database Improvements q  Pre-Relational •  Hierarchical •  Network q  Relational •  Row store •  OLAP / Data Warehouse q  Post-Relational •  RDF store •  Column store •  Bare bones relational •  Stream / complex event processing q  Push Down •  Database / data warehouse appliances (20+ on the market) •  In-database analytics, … (10+ on the market)
  • 8. Data Models For New Domains Must Honor Data Independence q  Array (Matrix)-store (SciDB) [Linear algebra] q  XML databases: structured content, information exchange q  Content management: e.g., Sharepoint q  Graph/network store: social networking (Facebook), link analysis q  Protein store: protein folding, drug discovery, … q  Geospatial / map store: location-based applications q  Time series: signal processing, statistical and financial analysis q  Cloud / Mesh data (NoSQL) stores: web scale applications q  and they just keep coming …
  • 9. Data Universe Database Universe Relational Data Universe
  • 10. Data Universe Graph- Network Time Data Series Scientific Model Data Data Model Model DBU Geo- Spatial RDM Data Model Document Data Digital Model Media ETC. Data ETC. ETC. Model
  • 11. Data Universe Graph- Network Time Data Series Scientific Model Data Data Model Model DBU Geo- Spatial RDM Data Model Document Data Digital Model Media ETC. Data ETC. ETC. Model
  • 12. Data Integration Solution Space: Data Independence Required Computation Problem Solving Databases Relational Optimal 4 homogeneous Optimal 4 pure relational data relational data Domain-specific Emerging Emerging Semantic Technologies (AI) Knowledge Representation Minimal Powerful Ontologies Minimal Powerful Semantic Web Modest / emerging Modest / emerging Semantic Data Management Emerging Emerging Architectural Information-As-A-Service Emerging Emerging Cloud Emerging N/A
  • 13. Databases vs. Semantic Web Discrete Worlds Heterogeneous Worlds Single Versions of Truth Multiple Truths Data Models LOD Models? Mathematical Logic What Logic ? 1,000s of databases Probabilistic / Eventual Common Sense Reasoning Reasoning? DI: Relational Join DI: Evidence Gathering Databases Semantic Web
  • 14. Databases vs. Web Web   Explora2on   Mul2ple  versions  of  truth   .  .  .     Analysis  /  BI   Evidence  Gathering Data  Warehouses   Scale .  .  .     Seman+cally  Heterogeneous  Views   Single  versions   Data  Management   of  truth   .  .  .     Seman+cally  Homogeneous  Databases  
  • 15. Data Integration q  Query: define the result •  Entity •  Computation q  Find candidate data sets: search Hard q  Extract, Transform, and Load (ETL): engineering q  Data Integration •  Entity resolution Harder •  Integration computation
  • 16. Managing Data @ Scale I q  Introduction •  Michael L. Brodie q  Global Data Integration and Global Data Mining •  Chris Bizer q  DB vs RDF: structure vs correlation •  Peter Boncz