SlideShare uma empresa Scribd logo
1 de 31
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/1
Outline
• Introduction
• Background
• Distributed Database Design
• Database Integration
➡ Schema Matching
➡ Schema Mapping
• Semantic Data Control
• Distributed Query Processing
• Multimedia Query Processing
• Distributed Transaction Management
• Data Replication
• Parallel Database Systems
• Distributed Object DBMS
• Peer-to-Peer Data Management
• Web Data Management
• Current Issues
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/2
Problem Definition
• Given existing databases with their Local Conceptual Schemas
(LCSs), how to integrate the LCSs into a Global Conceptual Schema (GCS)
➡ GCS is also called mediated schema
• Bottom-up design process
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/3
Integration Alternatives
• Physical integration
➡ Source databases integrated and the integrated database is materialized
➡ Data warehouses
• Logical integration
➡ Global conceptual schema is virtual and not materialized
➡ Enterprise Information Integration (EII)
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/4
Data Warehouse Approach
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/5
Bottom-up Design
• GCS (also called mediated schema) is defined first
➡ Map LCSs to this schema
➡ As in data warehouses
• GCS is defined as an integration of parts of LCSs
➡ Generate GCS and map LCSs to this GCS
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/6
GCS/LCS Relationship
• Local-as-view
➡ The GCS definition is assumed to exist, and each LCS is treated as a view
definition over it
• Global-as-view
➡ The GCS is defined as a set of views over the LCSs
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/7
Database Integration Process
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/8
Recall Access Architecture
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/9
Database Integration Issues
• Schema translation
➡ Component database schemas translated to a common intermediate canonical
representation
• Schema generation
➡ Intermediate schemas are used to create a global conceptual schema
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/10
Schema Translation
• What is the canonical data model?
➡ Relational
➡ Entity-relationship
✦ DIKE
➡ Object-oriented
✦ ARTEMIS
➡ Graph-oriented
✦ DIPE, TranScm, COMA, Cupid
✦ Preferable with emergence of XML
✦ No common graph formalism
• Mapping algorithms
➡ These are well-known
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/11
Schema Generation
• Schema matching
➡ Finding the correspondences between multiple schemas
• Schema integration
➡ Creation of the GCS (or mediated schema) using the correspondences
• Schema mapping
➡ How to map data from local databases to the GCS
• Important: sometimes the GCS is defined first and schema matching and
schema mapping is done against this target GCS
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/12
Running Example
EMP(ENO, ENAME, TITLE)
PROJ(PNO, PNAME, BUDGET, LOC, CNAME)
ASG(ENO, PNO, RESP, DUR)
PAY(TITLE, SAL)
Relational
E-R Model
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/13
Schema Matching
• Schema heterogeneity
➡ Structural heterogeneity
✦ Type conflicts
✦ Dependency conflicts
✦ Key conflicts
✦ Behavioral conflicts
➡ Semantic heterogeneity
✦ More important and harder to deal with
✦ Synonyms, homonyms, hypernyms
✦ Different ontology
✦ Imprecise wording
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/14
Schema Matching (cont’d)
• Other complications
➡ Insufficient schema and instance information
➡ Unavailability of schema documentation
➡ Subjectivity of matching
• Issues that affect schema matching
➡ Schema versus instance matching
➡ Element versus structure level matching
➡ Matching cardinality
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/15
Schema Matching Approaches
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/16
Linguistic Schema Matching
• Use element names and other textual information (textual
descriptions, annotations)
• May use external sources (e.g., Thesauri)
• 〈SC1.element-1 ≈ SC2.element-2, p,s〉
➡ Element-1 in schema SC1 is similar to element-2 in schema SC2 if predicate p
holds with a similarity value of s
• Schema level
➡ Deal with names of schema elements
➡ Handle cases such as synonyms, homonyms, hypernyms, data type
similarities
• Instance level
➡ Focus on information retrieval techniques (e.g., word frequencies, key terms)
➡ “Deduce” similarities from these
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/17
Linguistic Matchers
• Use a set of linguistic (terminological) rules
• Basic rules can be hand-crafted or may be discovered from outside sources
(e.g., WordNet)
• Predicate p and similarity value s
➡ hand-crafted ⇒ specified,
➡ discovered ⇒ may be computed or specified by an expert after discovery
• Examples
➡ 〈uppercase names ≈ lower case names, true, 1.0〉
➡ 〈uppercase names ≈ capitalized names, true, 1.0〉
➡ 〈capitalized names ≈ lower case names, true, 1.0〉
➡ 〈DB1.ASG ≈ DB2.WORKS_IN, true, 0.8〉
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/18
Automatic Discovery of Name
Similarities
• Affixes
➡ Common prefixes and suffixes between two element name strings
• N-grams
➡ Comparing how many substrings of length n are common between the two
name strings
• Edit distance
➡ Number of character modifications (additions, deletions, insertions) that
needs to be performed to convert one string into the other
• Soundex code
➡ Phonetic similarity between names based on their soundex codes
• Also look at data types
➡ Data type similarity may suggest stronger relationship than the computed
similarity using these methods or to differentiate between multiple strings
with same value
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/19
N-gram Example
• 3-grams of string “Responsibility” are the following:
Res  sib
ibi  esp
bip  spo
ili  pon
lit  ons
ity  nsi
• 3-grams of string “Resp” are
➡ Res
➡ esp
• 3-gram similarity: 2/12 = 0.17
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/20
Edit Distance Example
• Again consider “Responsibility” and “Resp”
• To convert “Responsibility” to “Resp”
➡ Delete characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y”
• To convert “Resp” to “Responsibility”
➡ Add characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y”
• The number of edit operations required is 10
• Similarity is 1 − (10/14) = 0.29
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/21
Constraint-based Matchers
• Data always have constraints – use them
➡ Data type information
➡ Value ranges
➡ …
• Examples
➡ RESP and RESPONSIBILITY: n-gram similarity = 0.17, edit distance similarity
= 0.19 (low)
➡ If they come from the same domain, this may increase their similarity value
➡ ENO in relational, WORKER.NUMBER and PROJECT.NUMBER in E-R
➡ ENO and WORKER.NUMBER may have type INTEGER while
PROJECT.NUMBER may have STRING
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/22
Constraint-based Structural
Matching
• If two schema elements are structurally similar, then there is a higher
likelihood that they represent the same concept
• Structural similarity:
➡ Same properties (attributes)
➡ “Neighborhood” similarity
✦ Using graph representation
✦ The set of nodes that can be reached within a particular path length from a node
are the neighbors of that node
✦ If two concepts (nodes) have similar set of neighbors, they are likely to represent
the same concept
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/23
Learning-based Schema
Matching
• Use machine learning techniques to determine schema matches
• Classification problem: classify concepts from various schemas into classes
according to their similarity. Those that fall into the same class represent
similar concepts
• Similarity is defined according to features of data instances
• Classification is “learned” from a training set
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/24
Learning-based Schema
Matching
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/25
Combined Schema Matching
Approaches
• Use multiple matchers
➡ Each matcher focuses on one area (name, etc)
• Meta-matcher integrates these into one prediction
• Integration may be simple (take average of similarity values) or more
complex (see Fagin’s work)
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/26
Schema Integration
• Use the correspondences to create a GCS
• Mainly a manual process, although rules can help
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/27
Binary Integration Methods
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/28
N-ary Integration Methods
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/29
Schema Mapping
• Mapping data from each local database (source) to GCS (target) while
preserving semantic consistency as defined in both source and target.
• Data warehouses ⇒ actual translation
• Data integration systems ⇒ discover mappings that can be used in the
query processing phase
• Mapping creation
• Mapping maintenance
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/30
Mapping Creation
Given
➡ A source LCS
➡ A target GCS
➡ A set of value correspondences discovered
during schema matching phase
Produce a set of queries that, when executed, will create GCS data instances
from the source data.
We are looking, for each Tk, a query Qk that is defined on a (possibly proper)
subset of the relations in S such that, when executed, will generate data for
Ti from the source relations
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/31
Mapping Creation Algorithm
General idea:
• Consider each Tk in turn. Divide Vk into subsets such that
each specifies one possible way that values of Tk can be computed.
• Each can be mapped to a query that, when executed, would generate
some of Tk’s data.
• Union of these queries gives

Mais conteúdo relacionado

Mais procurados

Relational Database Design
Relational Database DesignRelational Database Design
Relational Database DesignArchit Saxena
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memoryAshish Kumar
 
Database, 3 Distribution Design
Database, 3 Distribution DesignDatabase, 3 Distribution Design
Database, 3 Distribution DesignAli Usman
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query ProcessingMythili Kannan
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization Hafiz faiz
 
Database replication
Database replicationDatabase replication
Database replicationArslan111
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESDISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESAAKANKSHA JAIN
 
Intro to Distributed Database Management System
Intro to Distributed Database Management SystemIntro to Distributed Database Management System
Intro to Distributed Database Management SystemAli Raza
 
Introduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operationsIntroduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operationsAnand Kumar
 
Ddb 1.6-design issues
Ddb 1.6-design issuesDdb 1.6-design issues
Ddb 1.6-design issuesEsar Qasmi
 
Object oriented database model
Object oriented database modelObject oriented database model
Object oriented database modelPAQUIAAIZEL
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)Ravinder Kamboj
 

Mais procurados (20)

Relational Database Design
Relational Database DesignRelational Database Design
Relational Database Design
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Database, 3 Distribution Design
Database, 3 Distribution DesignDatabase, 3 Distribution Design
Database, 3 Distribution Design
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query Processing
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization
 
Database replication
Database replicationDatabase replication
Database replication
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Data integration
Data integrationData integration
Data integration
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESDISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
 
Active database
Active databaseActive database
Active database
 
Intro to Distributed Database Management System
Intro to Distributed Database Management SystemIntro to Distributed Database Management System
Intro to Distributed Database Management System
 
Introduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operationsIntroduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operations
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
 
Ddb 1.6-design issues
Ddb 1.6-design issuesDdb 1.6-design issues
Ddb 1.6-design issues
 
Object oriented database model
Object oriented database modelObject oriented database model
Object oriented database model
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)
 

Destaque

Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema IntegrationMustafa Jarrar
 
Data Integration (ETL)
Data Integration (ETL)Data Integration (ETL)
Data Integration (ETL)easysoft
 
Data integration ppt-bhawani nandan prasad - iim calcutta
Data integration ppt-bhawani nandan prasad - iim calcuttaData integration ppt-bhawani nandan prasad - iim calcutta
Data integration ppt-bhawani nandan prasad - iim calcuttaBhawani N Prasad
 
Database ,7 query localization
Database ,7 query localizationDatabase ,7 query localization
Database ,7 query localizationAli Usman
 
Database ,11 Concurrency Control
Database ,11 Concurrency ControlDatabase ,11 Concurrency Control
Database ,11 Concurrency ControlAli Usman
 
Database , 15 Object DBMS
Database , 15 Object DBMSDatabase , 15 Object DBMS
Database , 15 Object DBMSAli Usman
 
Database ,18 Current Issues
Database ,18 Current IssuesDatabase ,18 Current Issues
Database ,18 Current IssuesAli Usman
 
Database ,2 Background
 Database ,2 Background Database ,2 Background
Database ,2 BackgroundAli Usman
 
Database , 6 Query Introduction
Database , 6 Query Introduction Database , 6 Query Introduction
Database , 6 Query Introduction Ali Usman
 
Pal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integrationPal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integrationMustafa Jarrar
 
Modul 04 ta1_ metodologi penelitian
Modul 04 ta1_ metodologi penelitianModul 04 ta1_ metodologi penelitian
Modul 04 ta1_ metodologi penelitianFokgusta
 
Media ajarelektronik
Media ajarelektronikMedia ajarelektronik
Media ajarelektronikFokgusta
 
Processor Specifications
Processor SpecificationsProcessor Specifications
Processor SpecificationsAli Usman
 
SysML as a Common Integration Platform for Co-Simulations – Example of a Cybe...
SysML as a Common Integration Platform for Co-Simulations – Example of a Cybe...SysML as a Common Integration Platform for Co-Simulations – Example of a Cybe...
SysML as a Common Integration Platform for Co-Simulations – Example of a Cybe...Andrey Sadovykh
 
Pal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddataPal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddataMustafa Jarrar
 
SysML Design of Simulation Game
SysML Design of Simulation GameSysML Design of Simulation Game
SysML Design of Simulation GameDavid Hetherington
 
Model pembelajaran yang efektif
Model pembelajaran yang efektifModel pembelajaran yang efektif
Model pembelajaran yang efektifFokgusta
 

Destaque (20)

Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema Integration
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Data Integration (ETL)
Data Integration (ETL)Data Integration (ETL)
Data Integration (ETL)
 
DBMS Canonical cover
DBMS Canonical coverDBMS Canonical cover
DBMS Canonical cover
 
Data integration ppt-bhawani nandan prasad - iim calcutta
Data integration ppt-bhawani nandan prasad - iim calcuttaData integration ppt-bhawani nandan prasad - iim calcutta
Data integration ppt-bhawani nandan prasad - iim calcutta
 
Database ,7 query localization
Database ,7 query localizationDatabase ,7 query localization
Database ,7 query localization
 
Database ,11 Concurrency Control
Database ,11 Concurrency ControlDatabase ,11 Concurrency Control
Database ,11 Concurrency Control
 
Database , 15 Object DBMS
Database , 15 Object DBMSDatabase , 15 Object DBMS
Database , 15 Object DBMS
 
Database ,18 Current Issues
Database ,18 Current IssuesDatabase ,18 Current Issues
Database ,18 Current Issues
 
Database ,2 Background
 Database ,2 Background Database ,2 Background
Database ,2 Background
 
Database , 6 Query Introduction
Database , 6 Query Introduction Database , 6 Query Introduction
Database , 6 Query Introduction
 
Pal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integrationPal gov.tutorial2.session13 1.data schema integration
Pal gov.tutorial2.session13 1.data schema integration
 
test
testtest
test
 
Modul 04 ta1_ metodologi penelitian
Modul 04 ta1_ metodologi penelitianModul 04 ta1_ metodologi penelitian
Modul 04 ta1_ metodologi penelitian
 
Media ajarelektronik
Media ajarelektronikMedia ajarelektronik
Media ajarelektronik
 
Processor Specifications
Processor SpecificationsProcessor Specifications
Processor Specifications
 
SysML as a Common Integration Platform for Co-Simulations – Example of a Cybe...
SysML as a Common Integration Platform for Co-Simulations – Example of a Cybe...SysML as a Common Integration Platform for Co-Simulations – Example of a Cybe...
SysML as a Common Integration Platform for Co-Simulations – Example of a Cybe...
 
Pal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddataPal gov.tutorial2.session15 1.linkeddata
Pal gov.tutorial2.session15 1.linkeddata
 
SysML Design of Simulation Game
SysML Design of Simulation GameSysML Design of Simulation Game
SysML Design of Simulation Game
 
Model pembelajaran yang efektif
Model pembelajaran yang efektifModel pembelajaran yang efektif
Model pembelajaran yang efektif
 

Semelhante a Database , 4 Data Integration

Semelhante a Database , 4 Data Integration (20)

Database ,16 P2P
Database ,16 P2P Database ,16 P2P
Database ,16 P2P
 
Database , 17 Web
Database , 17 WebDatabase , 17 Web
Database , 17 Web
 
1 introduction
1 introduction1 introduction
1 introduction
 
6-Query_Intro (5).pdf
6-Query_Intro (5).pdf6-Query_Intro (5).pdf
6-Query_Intro (5).pdf
 
Nosql
NosqlNosql
Nosql
 
Nosql
NosqlNosql
Nosql
 
[Mas 500] Data Basics
[Mas 500] Data Basics[Mas 500] Data Basics
[Mas 500] Data Basics
 
1 introduction DDBS
1 introduction DDBS1 introduction DDBS
1 introduction DDBS
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 Introduction
 
DDBS PPT (1).pptx
DDBS PPT (1).pptxDDBS PPT (1).pptx
DDBS PPT (1).pptx
 
Dunsire roadmap meeting proposal
Dunsire roadmap meeting proposalDunsire roadmap meeting proposal
Dunsire roadmap meeting proposal
 
Top 5-nosql
Top 5-nosqlTop 5-nosql
Top 5-nosql
 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.
 
DBMS outline.pptx
DBMS outline.pptxDBMS outline.pptx
DBMS outline.pptx
 
NoSql
NoSqlNoSql
NoSql
 
01-Database Administration and Management.pdf
01-Database Administration and Management.pdf01-Database Administration and Management.pdf
01-Database Administration and Management.pdf
 
OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
 
1 introduction ddbms
1 introduction ddbms1 introduction ddbms
1 introduction ddbms
 
Nosql
NosqlNosql
Nosql
 
Info systems databases
Info systems databasesInfo systems databases
Info systems databases
 

Mais de Ali Usman

Cisco Packet Tracer Overview
Cisco Packet Tracer OverviewCisco Packet Tracer Overview
Cisco Packet Tracer OverviewAli Usman
 
Islamic Arts and Architecture
Islamic Arts and  ArchitectureIslamic Arts and  Architecture
Islamic Arts and ArchitectureAli Usman
 
Database ,14 Parallel DBMS
Database ,14 Parallel DBMSDatabase ,14 Parallel DBMS
Database ,14 Parallel DBMSAli Usman
 
Database , 13 Replication
Database , 13 ReplicationDatabase , 13 Replication
Database , 13 ReplicationAli Usman
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 ReliabilityAli Usman
 
Database ,10 Transactions
Database ,10 TransactionsDatabase ,10 Transactions
Database ,10 TransactionsAli Usman
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query OptimizationAli Usman
 
Database , 5 Semantic
Database , 5 SemanticDatabase , 5 Semantic
Database , 5 SemanticAli Usman
 
Processor Specifications
Processor SpecificationsProcessor Specifications
Processor SpecificationsAli Usman
 
Fifty Year Of Microprocessor
Fifty Year Of MicroprocessorFifty Year Of Microprocessor
Fifty Year Of MicroprocessorAli Usman
 
Discrete Structures lecture 2
 Discrete Structures lecture 2 Discrete Structures lecture 2
Discrete Structures lecture 2Ali Usman
 
Discrete Structures. Lecture 1
 Discrete Structures. Lecture 1  Discrete Structures. Lecture 1
Discrete Structures. Lecture 1 Ali Usman
 
Muslim Contributions in Medicine-Geography-Astronomy
Muslim Contributions in Medicine-Geography-AstronomyMuslim Contributions in Medicine-Geography-Astronomy
Muslim Contributions in Medicine-Geography-AstronomyAli Usman
 
Muslim Contributions in Geography
Muslim Contributions in GeographyMuslim Contributions in Geography
Muslim Contributions in GeographyAli Usman
 
Muslim Contributions in Astronomy
Muslim Contributions in AstronomyMuslim Contributions in Astronomy
Muslim Contributions in AstronomyAli Usman
 
Ptcl modem (user manual)
Ptcl modem (user manual)Ptcl modem (user manual)
Ptcl modem (user manual)Ali Usman
 
Nimat-ul-ALLAH shah wali
Nimat-ul-ALLAH shah wali Nimat-ul-ALLAH shah wali
Nimat-ul-ALLAH shah wali Ali Usman
 
Muslim Contributions in Mathematics
Muslim Contributions in MathematicsMuslim Contributions in Mathematics
Muslim Contributions in MathematicsAli Usman
 
Osi protocols
Osi protocolsOsi protocols
Osi protocolsAli Usman
 

Mais de Ali Usman (19)

Cisco Packet Tracer Overview
Cisco Packet Tracer OverviewCisco Packet Tracer Overview
Cisco Packet Tracer Overview
 
Islamic Arts and Architecture
Islamic Arts and  ArchitectureIslamic Arts and  Architecture
Islamic Arts and Architecture
 
Database ,14 Parallel DBMS
Database ,14 Parallel DBMSDatabase ,14 Parallel DBMS
Database ,14 Parallel DBMS
 
Database , 13 Replication
Database , 13 ReplicationDatabase , 13 Replication
Database , 13 Replication
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
 
Database ,10 Transactions
Database ,10 TransactionsDatabase ,10 Transactions
Database ,10 Transactions
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
 
Database , 5 Semantic
Database , 5 SemanticDatabase , 5 Semantic
Database , 5 Semantic
 
Processor Specifications
Processor SpecificationsProcessor Specifications
Processor Specifications
 
Fifty Year Of Microprocessor
Fifty Year Of MicroprocessorFifty Year Of Microprocessor
Fifty Year Of Microprocessor
 
Discrete Structures lecture 2
 Discrete Structures lecture 2 Discrete Structures lecture 2
Discrete Structures lecture 2
 
Discrete Structures. Lecture 1
 Discrete Structures. Lecture 1  Discrete Structures. Lecture 1
Discrete Structures. Lecture 1
 
Muslim Contributions in Medicine-Geography-Astronomy
Muslim Contributions in Medicine-Geography-AstronomyMuslim Contributions in Medicine-Geography-Astronomy
Muslim Contributions in Medicine-Geography-Astronomy
 
Muslim Contributions in Geography
Muslim Contributions in GeographyMuslim Contributions in Geography
Muslim Contributions in Geography
 
Muslim Contributions in Astronomy
Muslim Contributions in AstronomyMuslim Contributions in Astronomy
Muslim Contributions in Astronomy
 
Ptcl modem (user manual)
Ptcl modem (user manual)Ptcl modem (user manual)
Ptcl modem (user manual)
 
Nimat-ul-ALLAH shah wali
Nimat-ul-ALLAH shah wali Nimat-ul-ALLAH shah wali
Nimat-ul-ALLAH shah wali
 
Muslim Contributions in Mathematics
Muslim Contributions in MathematicsMuslim Contributions in Mathematics
Muslim Contributions in Mathematics
 
Osi protocols
Osi protocolsOsi protocols
Osi protocols
 

Último

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Último (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Database , 4 Data Integration

  • 1. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/1 Outline • Introduction • Background • Distributed Database Design • Database Integration ➡ Schema Matching ➡ Schema Mapping • Semantic Data Control • Distributed Query Processing • Multimedia Query Processing • Distributed Transaction Management • Data Replication • Parallel Database Systems • Distributed Object DBMS • Peer-to-Peer Data Management • Web Data Management • Current Issues
  • 2. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/2 Problem Definition • Given existing databases with their Local Conceptual Schemas (LCSs), how to integrate the LCSs into a Global Conceptual Schema (GCS) ➡ GCS is also called mediated schema • Bottom-up design process
  • 3. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/3 Integration Alternatives • Physical integration ➡ Source databases integrated and the integrated database is materialized ➡ Data warehouses • Logical integration ➡ Global conceptual schema is virtual and not materialized ➡ Enterprise Information Integration (EII)
  • 4. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/4 Data Warehouse Approach
  • 5. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/5 Bottom-up Design • GCS (also called mediated schema) is defined first ➡ Map LCSs to this schema ➡ As in data warehouses • GCS is defined as an integration of parts of LCSs ➡ Generate GCS and map LCSs to this GCS
  • 6. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/6 GCS/LCS Relationship • Local-as-view ➡ The GCS definition is assumed to exist, and each LCS is treated as a view definition over it • Global-as-view ➡ The GCS is defined as a set of views over the LCSs
  • 7. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/7 Database Integration Process
  • 8. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/8 Recall Access Architecture
  • 9. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/9 Database Integration Issues • Schema translation ➡ Component database schemas translated to a common intermediate canonical representation • Schema generation ➡ Intermediate schemas are used to create a global conceptual schema
  • 10. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/10 Schema Translation • What is the canonical data model? ➡ Relational ➡ Entity-relationship ✦ DIKE ➡ Object-oriented ✦ ARTEMIS ➡ Graph-oriented ✦ DIPE, TranScm, COMA, Cupid ✦ Preferable with emergence of XML ✦ No common graph formalism • Mapping algorithms ➡ These are well-known
  • 11. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/11 Schema Generation • Schema matching ➡ Finding the correspondences between multiple schemas • Schema integration ➡ Creation of the GCS (or mediated schema) using the correspondences • Schema mapping ➡ How to map data from local databases to the GCS • Important: sometimes the GCS is defined first and schema matching and schema mapping is done against this target GCS
  • 12. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/12 Running Example EMP(ENO, ENAME, TITLE) PROJ(PNO, PNAME, BUDGET, LOC, CNAME) ASG(ENO, PNO, RESP, DUR) PAY(TITLE, SAL) Relational E-R Model
  • 13. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/13 Schema Matching • Schema heterogeneity ➡ Structural heterogeneity ✦ Type conflicts ✦ Dependency conflicts ✦ Key conflicts ✦ Behavioral conflicts ➡ Semantic heterogeneity ✦ More important and harder to deal with ✦ Synonyms, homonyms, hypernyms ✦ Different ontology ✦ Imprecise wording
  • 14. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/14 Schema Matching (cont’d) • Other complications ➡ Insufficient schema and instance information ➡ Unavailability of schema documentation ➡ Subjectivity of matching • Issues that affect schema matching ➡ Schema versus instance matching ➡ Element versus structure level matching ➡ Matching cardinality
  • 15. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/15 Schema Matching Approaches
  • 16. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/16 Linguistic Schema Matching • Use element names and other textual information (textual descriptions, annotations) • May use external sources (e.g., Thesauri) • 〈SC1.element-1 ≈ SC2.element-2, p,s〉 ➡ Element-1 in schema SC1 is similar to element-2 in schema SC2 if predicate p holds with a similarity value of s • Schema level ➡ Deal with names of schema elements ➡ Handle cases such as synonyms, homonyms, hypernyms, data type similarities • Instance level ➡ Focus on information retrieval techniques (e.g., word frequencies, key terms) ➡ “Deduce” similarities from these
  • 17. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/17 Linguistic Matchers • Use a set of linguistic (terminological) rules • Basic rules can be hand-crafted or may be discovered from outside sources (e.g., WordNet) • Predicate p and similarity value s ➡ hand-crafted ⇒ specified, ➡ discovered ⇒ may be computed or specified by an expert after discovery • Examples ➡ 〈uppercase names ≈ lower case names, true, 1.0〉 ➡ 〈uppercase names ≈ capitalized names, true, 1.0〉 ➡ 〈capitalized names ≈ lower case names, true, 1.0〉 ➡ 〈DB1.ASG ≈ DB2.WORKS_IN, true, 0.8〉
  • 18. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/18 Automatic Discovery of Name Similarities • Affixes ➡ Common prefixes and suffixes between two element name strings • N-grams ➡ Comparing how many substrings of length n are common between the two name strings • Edit distance ➡ Number of character modifications (additions, deletions, insertions) that needs to be performed to convert one string into the other • Soundex code ➡ Phonetic similarity between names based on their soundex codes • Also look at data types ➡ Data type similarity may suggest stronger relationship than the computed similarity using these methods or to differentiate between multiple strings with same value
  • 19. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/19 N-gram Example • 3-grams of string “Responsibility” are the following: Res  sib ibi  esp bip  spo ili  pon lit  ons ity  nsi • 3-grams of string “Resp” are ➡ Res ➡ esp • 3-gram similarity: 2/12 = 0.17
  • 20. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/20 Edit Distance Example • Again consider “Responsibility” and “Resp” • To convert “Responsibility” to “Resp” ➡ Delete characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y” • To convert “Resp” to “Responsibility” ➡ Add characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y” • The number of edit operations required is 10 • Similarity is 1 − (10/14) = 0.29
  • 21. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/21 Constraint-based Matchers • Data always have constraints – use them ➡ Data type information ➡ Value ranges ➡ … • Examples ➡ RESP and RESPONSIBILITY: n-gram similarity = 0.17, edit distance similarity = 0.19 (low) ➡ If they come from the same domain, this may increase their similarity value ➡ ENO in relational, WORKER.NUMBER and PROJECT.NUMBER in E-R ➡ ENO and WORKER.NUMBER may have type INTEGER while PROJECT.NUMBER may have STRING
  • 22. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/22 Constraint-based Structural Matching • If two schema elements are structurally similar, then there is a higher likelihood that they represent the same concept • Structural similarity: ➡ Same properties (attributes) ➡ “Neighborhood” similarity ✦ Using graph representation ✦ The set of nodes that can be reached within a particular path length from a node are the neighbors of that node ✦ If two concepts (nodes) have similar set of neighbors, they are likely to represent the same concept
  • 23. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/23 Learning-based Schema Matching • Use machine learning techniques to determine schema matches • Classification problem: classify concepts from various schemas into classes according to their similarity. Those that fall into the same class represent similar concepts • Similarity is defined according to features of data instances • Classification is “learned” from a training set
  • 24. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/24 Learning-based Schema Matching
  • 25. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/25 Combined Schema Matching Approaches • Use multiple matchers ➡ Each matcher focuses on one area (name, etc) • Meta-matcher integrates these into one prediction • Integration may be simple (take average of similarity values) or more complex (see Fagin’s work)
  • 26. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/26 Schema Integration • Use the correspondences to create a GCS • Mainly a manual process, although rules can help
  • 27. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/27 Binary Integration Methods
  • 28. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/28 N-ary Integration Methods
  • 29. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/29 Schema Mapping • Mapping data from each local database (source) to GCS (target) while preserving semantic consistency as defined in both source and target. • Data warehouses ⇒ actual translation • Data integration systems ⇒ discover mappings that can be used in the query processing phase • Mapping creation • Mapping maintenance
  • 30. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/30 Mapping Creation Given ➡ A source LCS ➡ A target GCS ➡ A set of value correspondences discovered during schema matching phase Produce a set of queries that, when executed, will create GCS data instances from the source data. We are looking, for each Tk, a query Qk that is defined on a (possibly proper) subset of the relations in S such that, when executed, will generate data for Ti from the source relations
  • 31. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.4/31 Mapping Creation Algorithm General idea: • Consider each Tk in turn. Divide Vk into subsets such that each specifies one possible way that values of Tk can be computed. • Each can be mapped to a query that, when executed, would generate some of Tk’s data. • Union of these queries gives