Enviar pesquisa
Carregar
Database , 4 Data Integration
•
Transferir como PPTX, PDF
•
3 gostaram
•
4,555 visualizações
A
Ali Usman
Seguir
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 31
Baixar agora
Recomendados
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
Gyanmanjari Institute Of Technology
Parallel Database
Parallel Database
VESIT/University of Mumbai
DDBMS Paper with Solution
DDBMS Paper with Solution
Gyanmanjari Institute Of Technology
DDBMS
DDBMS
Ravinder Kamboj
Distributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data Control
Gyanmanjari Institute Of Technology
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
Gyanmanjari Institute Of Technology
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.
Meghaj Mallick
Distributed Database System
Distributed Database System
Sulemang
Recomendados
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
Gyanmanjari Institute Of Technology
Parallel Database
Parallel Database
VESIT/University of Mumbai
DDBMS Paper with Solution
DDBMS Paper with Solution
Gyanmanjari Institute Of Technology
DDBMS
DDBMS
Ravinder Kamboj
Distributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data Control
Gyanmanjari Institute Of Technology
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
Gyanmanjari Institute Of Technology
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.
Meghaj Mallick
Distributed Database System
Distributed Database System
Sulemang
Relational Database Design
Relational Database Design
Archit Saxena
Temporal databases
Temporal databases
Dabbal Singh Mahara
distributed shared memory
distributed shared memory
Ashish Kumar
Database, 3 Distribution Design
Database, 3 Distribution Design
Ali Usman
Distributed Query Processing
Distributed Query Processing
Mythili Kannan
introduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
Database System Architectures
Database System Architectures
Information Technology
Query Decomposition and data localization
Query Decomposition and data localization
Hafiz faiz
Database replication
Database replication
Arslan111
Distributed database
Distributed database
ReachLocal Services India
Data integration
Data integration
Umar Alharaky
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
AAKANKSHA JAIN
Active database
Active database
Dabbal Singh Mahara
Intro to Distributed Database Management System
Intro to Distributed Database Management System
Ali Raza
Introduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operations
Anand Kumar
Replication in Distributed Systems
Replication in Distributed Systems
Kavya Barnadhya Hazarika
Ddb 1.6-design issues
Ddb 1.6-design issues
Esar Qasmi
Object oriented database model
Object oriented database model
PAQUIAAIZEL
Relational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
Query processing and optimization (updated)
Query processing and optimization (updated)
Ravinder Kamboj
Jarrar: Data Schema Integration
Jarrar: Data Schema Integration
Mustafa Jarrar
Introduction to ETL and Data Integration
Introduction to ETL and Data Integration
CloverDX (formerly known as CloverETL)
Mais conteúdo relacionado
Mais procurados
Relational Database Design
Relational Database Design
Archit Saxena
Temporal databases
Temporal databases
Dabbal Singh Mahara
distributed shared memory
distributed shared memory
Ashish Kumar
Database, 3 Distribution Design
Database, 3 Distribution Design
Ali Usman
Distributed Query Processing
Distributed Query Processing
Mythili Kannan
introduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
Database System Architectures
Database System Architectures
Information Technology
Query Decomposition and data localization
Query Decomposition and data localization
Hafiz faiz
Database replication
Database replication
Arslan111
Distributed database
Distributed database
ReachLocal Services India
Data integration
Data integration
Umar Alharaky
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
AAKANKSHA JAIN
Active database
Active database
Dabbal Singh Mahara
Intro to Distributed Database Management System
Intro to Distributed Database Management System
Ali Raza
Introduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operations
Anand Kumar
Replication in Distributed Systems
Replication in Distributed Systems
Kavya Barnadhya Hazarika
Ddb 1.6-design issues
Ddb 1.6-design issues
Esar Qasmi
Object oriented database model
Object oriented database model
PAQUIAAIZEL
Relational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
Query processing and optimization (updated)
Query processing and optimization (updated)
Ravinder Kamboj
Mais procurados
(20)
Relational Database Design
Relational Database Design
Temporal databases
Temporal databases
distributed shared memory
distributed shared memory
Database, 3 Distribution Design
Database, 3 Distribution Design
Distributed Query Processing
Distributed Query Processing
introduction to NOSQL Database
introduction to NOSQL Database
Database System Architectures
Database System Architectures
Query Decomposition and data localization
Query Decomposition and data localization
Database replication
Database replication
Distributed database
Distributed database
Data integration
Data integration
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
Active database
Active database
Intro to Distributed Database Management System
Intro to Distributed Database Management System
Introduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operations
Replication in Distributed Systems
Replication in Distributed Systems
Ddb 1.6-design issues
Ddb 1.6-design issues
Object oriented database model
Object oriented database model
Relational databases vs Non-relational databases
Relational databases vs Non-relational databases
Query processing and optimization (updated)
Query processing and optimization (updated)
Destaque
Jarrar: Data Schema Integration
Jarrar: Data Schema Integration
Mustafa Jarrar
Introduction to ETL and Data Integration
Introduction to ETL and Data Integration
CloverDX (formerly known as CloverETL)
Data Integration (ETL)
Data Integration (ETL)
easysoft
DBMS Canonical cover
DBMS Canonical cover
Saurabh Tandel
Data integration ppt-bhawani nandan prasad - iim calcutta
Data integration ppt-bhawani nandan prasad - iim calcutta
Bhawani N Prasad
Database ,7 query localization
Database ,7 query localization
Ali Usman
Database ,11 Concurrency Control
Database ,11 Concurrency Control
Ali Usman
Database , 15 Object DBMS
Database , 15 Object DBMS
Ali Usman
Database ,18 Current Issues
Database ,18 Current Issues
Ali Usman
Database ,2 Background
Database ,2 Background
Ali Usman
Database , 6 Query Introduction
Database , 6 Query Introduction
Ali Usman
Pal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integration
Mustafa Jarrar
test
test
eduard_c
Modul 04 ta1_ metodologi penelitian
Modul 04 ta1_ metodologi penelitian
Fokgusta
Media ajarelektronik
Media ajarelektronik
Fokgusta
Processor Specifications
Processor Specifications
Ali Usman
SysML as a Common Integration Platform for Co-Simulations – Example of a Cybe...
SysML as a Common Integration Platform for Co-Simulations – Example of a Cybe...
Andrey Sadovykh
Pal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddata
Mustafa Jarrar
SysML Design of Simulation Game
SysML Design of Simulation Game
David Hetherington
Model pembelajaran yang efektif
Model pembelajaran yang efektif
Fokgusta
Destaque
(20)
Jarrar: Data Schema Integration
Jarrar: Data Schema Integration
Introduction to ETL and Data Integration
Introduction to ETL and Data Integration
Data Integration (ETL)
Data Integration (ETL)
DBMS Canonical cover
DBMS Canonical cover
Data integration ppt-bhawani nandan prasad - iim calcutta
Data integration ppt-bhawani nandan prasad - iim calcutta
Database ,7 query localization
Database ,7 query localization
Database ,11 Concurrency Control
Database ,11 Concurrency Control
Database , 15 Object DBMS
Database , 15 Object DBMS
Database ,18 Current Issues
Database ,18 Current Issues
Database ,2 Background
Database ,2 Background
Database , 6 Query Introduction
Database , 6 Query Introduction
Pal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integration
test
test
Modul 04 ta1_ metodologi penelitian
Modul 04 ta1_ metodologi penelitian
Media ajarelektronik
Media ajarelektronik
Processor Specifications
Processor Specifications
SysML as a Common Integration Platform for Co-Simulations – Example of a Cybe...
SysML as a Common Integration Platform for Co-Simulations – Example of a Cybe...
Pal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddata
SysML Design of Simulation Game
SysML Design of Simulation Game
Model pembelajaran yang efektif
Model pembelajaran yang efektif
Semelhante a Database , 4 Data Integration
Database ,16 P2P
Database ,16 P2P
Ali Usman
Database , 17 Web
Database , 17 Web
Ali Usman
1 introduction
1 introduction
Amrit Kaur
6-Query_Intro (5).pdf
6-Query_Intro (5).pdf
JaveriaShoaib4
Nosql
Nosql
Roxana Tadayon
Nosql
Nosql
ROXTAD71
[Mas 500] Data Basics
[Mas 500] Data Basics
rahulbot
1 introduction DDBS
1 introduction DDBS
naimanighat
Database , 1 Introduction
Database , 1 Introduction
Ali Usman
DDBS PPT (1).pptx
DDBS PPT (1).pptx
HarshitSingh334328
Dunsire roadmap meeting proposal
Dunsire roadmap meeting proposal
National Information Standards Organization (NISO)
Top 5-nosql
Top 5-nosql
Mehul Jariwala
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.
Synaptica, LLC
DBMS outline.pptx
DBMS outline.pptx
DrThenmozhiKarunanit
NoSql
NoSql
AnitaSenthilkumar
01-Database Administration and Management.pdf
01-Database Administration and Management.pdf
TOUSEEQHAIDER14
OpenLSH - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
J Singh
1 introduction ddbms
1 introduction ddbms
amna izzat
Nosql
Nosql
Muluken Sholaye Tesfaye
Info systems databases
Info systems databases
MR Z
Semelhante a Database , 4 Data Integration
(20)
Database ,16 P2P
Database ,16 P2P
Database , 17 Web
Database , 17 Web
1 introduction
1 introduction
6-Query_Intro (5).pdf
6-Query_Intro (5).pdf
Nosql
Nosql
Nosql
Nosql
[Mas 500] Data Basics
[Mas 500] Data Basics
1 introduction DDBS
1 introduction DDBS
Database , 1 Introduction
Database , 1 Introduction
DDBS PPT (1).pptx
DDBS PPT (1).pptx
Dunsire roadmap meeting proposal
Dunsire roadmap meeting proposal
Top 5-nosql
Top 5-nosql
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.
DBMS outline.pptx
DBMS outline.pptx
NoSql
NoSql
01-Database Administration and Management.pdf
01-Database Administration and Management.pdf
OpenLSH - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
1 introduction ddbms
1 introduction ddbms
Nosql
Nosql
Info systems databases
Info systems databases
Mais de Ali Usman
Cisco Packet Tracer Overview
Cisco Packet Tracer Overview
Ali Usman
Islamic Arts and Architecture
Islamic Arts and Architecture
Ali Usman
Database ,14 Parallel DBMS
Database ,14 Parallel DBMS
Ali Usman
Database , 13 Replication
Database , 13 Replication
Ali Usman
Database , 12 Reliability
Database , 12 Reliability
Ali Usman
Database ,10 Transactions
Database ,10 Transactions
Ali Usman
Database , 8 Query Optimization
Database , 8 Query Optimization
Ali Usman
Database , 5 Semantic
Database , 5 Semantic
Ali Usman
Processor Specifications
Processor Specifications
Ali Usman
Fifty Year Of Microprocessor
Fifty Year Of Microprocessor
Ali Usman
Discrete Structures lecture 2
Discrete Structures lecture 2
Ali Usman
Discrete Structures. Lecture 1
Discrete Structures. Lecture 1
Ali Usman
Muslim Contributions in Medicine-Geography-Astronomy
Muslim Contributions in Medicine-Geography-Astronomy
Ali Usman
Muslim Contributions in Geography
Muslim Contributions in Geography
Ali Usman
Muslim Contributions in Astronomy
Muslim Contributions in Astronomy
Ali Usman
Ptcl modem (user manual)
Ptcl modem (user manual)
Ali Usman
Nimat-ul-ALLAH shah wali
Nimat-ul-ALLAH shah wali
Ali Usman
Muslim Contributions in Mathematics
Muslim Contributions in Mathematics
Ali Usman
Osi protocols
Osi protocols
Ali Usman
Mais de Ali Usman
(19)
Cisco Packet Tracer Overview
Cisco Packet Tracer Overview
Islamic Arts and Architecture
Islamic Arts and Architecture
Database ,14 Parallel DBMS
Database ,14 Parallel DBMS
Database , 13 Replication
Database , 13 Replication
Database , 12 Reliability
Database , 12 Reliability
Database ,10 Transactions
Database ,10 Transactions
Database , 8 Query Optimization
Database , 8 Query Optimization
Database , 5 Semantic
Database , 5 Semantic
Processor Specifications
Processor Specifications
Fifty Year Of Microprocessor
Fifty Year Of Microprocessor
Discrete Structures lecture 2
Discrete Structures lecture 2
Discrete Structures. Lecture 1
Discrete Structures. Lecture 1
Muslim Contributions in Medicine-Geography-Astronomy
Muslim Contributions in Medicine-Geography-Astronomy
Muslim Contributions in Geography
Muslim Contributions in Geography
Muslim Contributions in Astronomy
Muslim Contributions in Astronomy
Ptcl modem (user manual)
Ptcl modem (user manual)
Nimat-ul-ALLAH shah wali
Nimat-ul-ALLAH shah wali
Muslim Contributions in Mathematics
Muslim Contributions in Mathematics
Osi protocols
Osi protocols
Último
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Scott Keck-Warren
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
charlottematthew16
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
Pixlogix Infotech
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Curtis Poe
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
charlottematthew16
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Hervé Boutemy
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
RankYa
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
UiPathCommunity
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Addepto
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Mattias Andersson
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Stephanie Beckett
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Alfredo García Lavilla
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
Último
(20)
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Database , 4 Data Integration
1.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/1 Outline • Introduction • Background • Distributed Database Design • Database Integration ➡ Schema Matching ➡ Schema Mapping • Semantic Data Control • Distributed Query Processing • Multimedia Query Processing • Distributed Transaction Management • Data Replication • Parallel Database Systems • Distributed Object DBMS • Peer-to-Peer Data Management • Web Data Management • Current Issues
2.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/2 Problem Definition • Given existing databases with their Local Conceptual Schemas (LCSs), how to integrate the LCSs into a Global Conceptual Schema (GCS) ➡ GCS is also called mediated schema • Bottom-up design process
3.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/3 Integration Alternatives • Physical integration ➡ Source databases integrated and the integrated database is materialized ➡ Data warehouses • Logical integration ➡ Global conceptual schema is virtual and not materialized ➡ Enterprise Information Integration (EII)
4.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/4 Data Warehouse Approach
5.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/5 Bottom-up Design • GCS (also called mediated schema) is defined first ➡ Map LCSs to this schema ➡ As in data warehouses • GCS is defined as an integration of parts of LCSs ➡ Generate GCS and map LCSs to this GCS
6.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/6 GCS/LCS Relationship • Local-as-view ➡ The GCS definition is assumed to exist, and each LCS is treated as a view definition over it • Global-as-view ➡ The GCS is defined as a set of views over the LCSs
7.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/7 Database Integration Process
8.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/8 Recall Access Architecture
9.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/9 Database Integration Issues • Schema translation ➡ Component database schemas translated to a common intermediate canonical representation • Schema generation ➡ Intermediate schemas are used to create a global conceptual schema
10.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/10 Schema Translation • What is the canonical data model? ➡ Relational ➡ Entity-relationship ✦ DIKE ➡ Object-oriented ✦ ARTEMIS ➡ Graph-oriented ✦ DIPE, TranScm, COMA, Cupid ✦ Preferable with emergence of XML ✦ No common graph formalism • Mapping algorithms ➡ These are well-known
11.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/11 Schema Generation • Schema matching ➡ Finding the correspondences between multiple schemas • Schema integration ➡ Creation of the GCS (or mediated schema) using the correspondences • Schema mapping ➡ How to map data from local databases to the GCS • Important: sometimes the GCS is defined first and schema matching and schema mapping is done against this target GCS
12.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/12 Running Example EMP(ENO, ENAME, TITLE) PROJ(PNO, PNAME, BUDGET, LOC, CNAME) ASG(ENO, PNO, RESP, DUR) PAY(TITLE, SAL) Relational E-R Model
13.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/13 Schema Matching • Schema heterogeneity ➡ Structural heterogeneity ✦ Type conflicts ✦ Dependency conflicts ✦ Key conflicts ✦ Behavioral conflicts ➡ Semantic heterogeneity ✦ More important and harder to deal with ✦ Synonyms, homonyms, hypernyms ✦ Different ontology ✦ Imprecise wording
14.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/14 Schema Matching (cont’d) • Other complications ➡ Insufficient schema and instance information ➡ Unavailability of schema documentation ➡ Subjectivity of matching • Issues that affect schema matching ➡ Schema versus instance matching ➡ Element versus structure level matching ➡ Matching cardinality
15.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/15 Schema Matching Approaches
16.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/16 Linguistic Schema Matching • Use element names and other textual information (textual descriptions, annotations) • May use external sources (e.g., Thesauri) • 〈SC1.element-1 ≈ SC2.element-2, p,s〉 ➡ Element-1 in schema SC1 is similar to element-2 in schema SC2 if predicate p holds with a similarity value of s • Schema level ➡ Deal with names of schema elements ➡ Handle cases such as synonyms, homonyms, hypernyms, data type similarities • Instance level ➡ Focus on information retrieval techniques (e.g., word frequencies, key terms) ➡ “Deduce” similarities from these
17.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/17 Linguistic Matchers • Use a set of linguistic (terminological) rules • Basic rules can be hand-crafted or may be discovered from outside sources (e.g., WordNet) • Predicate p and similarity value s ➡ hand-crafted ⇒ specified, ➡ discovered ⇒ may be computed or specified by an expert after discovery • Examples ➡ 〈uppercase names ≈ lower case names, true, 1.0〉 ➡ 〈uppercase names ≈ capitalized names, true, 1.0〉 ➡ 〈capitalized names ≈ lower case names, true, 1.0〉 ➡ 〈DB1.ASG ≈ DB2.WORKS_IN, true, 0.8〉
18.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/18 Automatic Discovery of Name Similarities • Affixes ➡ Common prefixes and suffixes between two element name strings • N-grams ➡ Comparing how many substrings of length n are common between the two name strings • Edit distance ➡ Number of character modifications (additions, deletions, insertions) that needs to be performed to convert one string into the other • Soundex code ➡ Phonetic similarity between names based on their soundex codes • Also look at data types ➡ Data type similarity may suggest stronger relationship than the computed similarity using these methods or to differentiate between multiple strings with same value
19.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/19 N-gram Example • 3-grams of string “Responsibility” are the following: Res sib ibi esp bip spo ili pon lit ons ity nsi • 3-grams of string “Resp” are ➡ Res ➡ esp • 3-gram similarity: 2/12 = 0.17
20.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/20 Edit Distance Example • Again consider “Responsibility” and “Resp” • To convert “Responsibility” to “Resp” ➡ Delete characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y” • To convert “Resp” to “Responsibility” ➡ Add characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y” • The number of edit operations required is 10 • Similarity is 1 − (10/14) = 0.29
21.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/21 Constraint-based Matchers • Data always have constraints – use them ➡ Data type information ➡ Value ranges ➡ … • Examples ➡ RESP and RESPONSIBILITY: n-gram similarity = 0.17, edit distance similarity = 0.19 (low) ➡ If they come from the same domain, this may increase their similarity value ➡ ENO in relational, WORKER.NUMBER and PROJECT.NUMBER in E-R ➡ ENO and WORKER.NUMBER may have type INTEGER while PROJECT.NUMBER may have STRING
22.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/22 Constraint-based Structural Matching • If two schema elements are structurally similar, then there is a higher likelihood that they represent the same concept • Structural similarity: ➡ Same properties (attributes) ➡ “Neighborhood” similarity ✦ Using graph representation ✦ The set of nodes that can be reached within a particular path length from a node are the neighbors of that node ✦ If two concepts (nodes) have similar set of neighbors, they are likely to represent the same concept
23.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/23 Learning-based Schema Matching • Use machine learning techniques to determine schema matches • Classification problem: classify concepts from various schemas into classes according to their similarity. Those that fall into the same class represent similar concepts • Similarity is defined according to features of data instances • Classification is “learned” from a training set
24.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/24 Learning-based Schema Matching
25.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/25 Combined Schema Matching Approaches • Use multiple matchers ➡ Each matcher focuses on one area (name, etc) • Meta-matcher integrates these into one prediction • Integration may be simple (take average of similarity values) or more complex (see Fagin’s work)
26.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/26 Schema Integration • Use the correspondences to create a GCS • Mainly a manual process, although rules can help
27.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/27 Binary Integration Methods
28.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/28 N-ary Integration Methods
29.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/29 Schema Mapping • Mapping data from each local database (source) to GCS (target) while preserving semantic consistency as defined in both source and target. • Data warehouses ⇒ actual translation • Data integration systems ⇒ discover mappings that can be used in the query processing phase • Mapping creation • Mapping maintenance
30.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/30 Mapping Creation Given ➡ A source LCS ➡ A target GCS ➡ A set of value correspondences discovered during schema matching phase Produce a set of queries that, when executed, will create GCS data instances from the source data. We are looking, for each Tk, a query Qk that is defined on a (possibly proper) subset of the relations in S such that, when executed, will generate data for Ti from the source relations
31.
Distributed DBMS ©
M. T. Özsu & P. Valduriez Ch.4/31 Mapping Creation Algorithm General idea: • Consider each Tk in turn. Divide Vk into subsets such that each specifies one possible way that values of Tk can be computed. • Each can be mapped to a query that, when executed, would generate some of Tk’s data. • Union of these queries gives
Baixar agora