SlideShare uma empresa Scribd logo
1 de 37
A Survey of Approaches to Automatic Schema Matching Erhard Rahm Philip A. Bernstein VLDB  2001 1
Introduction Schema means representation of data. Schema matching is a basic problem in many database application domains. We present a taxonomy that covers many of these existing approaches. 2
Match Match, which takes two schemas as input and produces a mapping between elements of the two schemas that correspond semantically to each other. 3
Mapping(cont.) A mapping element  Cust.C# to Customer.CustID Expression =>“Cust.C# = Customer.CustID”. Concatenate(Cust.FirstName, Cust.LastName) = Customer.Contact” 4
Application Domains Schema integration. Data warehouses. E-commerce. Semantic query processing. 5
Architecture for Generic Match(cont.) 6
Classification of Schema Matching Approaches Overview 7
Classification of Schema Matching Approaches For individual matchers, we consider the following largely-orthogonal classification criteria:1. Instance vs schema:      matching material are from instance or schema.2. Element vs structure:match for individual schema elements, such as attributes,        or for combinations of elements, such as complex schema      structures. 8
Classification of Schema Matching Approaches(cont.)       3. Language vs constraint: -linguistic-based approach based on names and textual       descriptions             -constraint-based approach based on keys and relationships.  4. Matching cardinality:each mapping element may interrelate one or more       elements of the two schemas.   5. Auxiliary information:       such as dictionaries, global schemas, previous matching       decisions, and user input. 9
Classification of Schema Matching Approaches Overview 10
Schema-Level Matchers Only consider schema information, such as -Name.-Description.-Data type.-Relationship types (part-of, is-a, etc.).-Constraints.-Schema structure. 11
Classification of Schema Matching Approaches Overview 12
Granularity of Match Element-levelvsStructure-level. Element-level: -match elements at the atomic level, such as     attributes in an XML schema. Structure-level: -matching combinations of elements that    appear together in a structure. 13
Match Cardinality 14
Classification of Schema Matching Approaches Overview 15
Linguistic Approaches Language-based or linguistic matchers use names and text to find semantically similar schema elements. We discuss two schema-level approaches -Name matching.  -Description matching. 16
Name Matching Name-based matching matches schema elements with equal or similar names.  Similarity of names can be defined and measured in various ways:1. Equality of names.     - Homonyms  ex: “line” of business vs “line” of order.2. Equality of canonical name.CName -> customer name.EmpNO ->employee number.3. Equality of synonyms.car ∼ automobile.  mark ∼ brand. 17
Name Matching (cont.) 4. Equality of hypernyms.book is-a publication and article is-a publication imply  book∼publication, article∼publication, and book∼article. 5. Similarity of names based pronunciation. ShipTo ∼ = Ship2 .6. User-provided name matches. reportsTo ∼ manager.    issue ∼ bug. 18
Description Matching Description are used to express the intended semantics of schema elements.eg:    S1: empn // employee name.                 S2: name // name of employee. 19
Classification of Schema Matching Approaches Overview 20
Constraint-based Approaches If input schemas contain such information, it can be used by a matcher to determine the similarity of schema elements. Schemas often contain constraints to define-data types.-value ranges.-uniqueness.-optionality.-relationship types and so on. 21
Constraint-based Approaches(cont.) Type and key information suggest that Born matches Birthdate and Pnomatches either EmpNo or DeptNo.  22
Auxiliary Information Auxiliary Information:1.Dictionaries.2.Thesauri.3.User-provided information .can improve our matching process. Reuse the matched schemas. 23
Reusing Schema and Mapping Information(cont.) 24
Instance-Level Approaches Instance-level has two approaches:1. To enhance the effectiveness of schema-     level matching. 2. To perform instance-level matching on its     own. Most of the approaches discussed previously for schema-level matching can be applied to instance-level matching. 25
Instance-Level Approaches(cont.) DeptName is a better match candidate for Dept than EmpName. Take EmpNo, DeptNoandPno as example. Based on similar value ranges ,we match Pnoto EmpNo rather than DeptNo. 26
Combining Different Matchers A matcher that uses just one approach is unlikely to achieve as many good match candidates as one that combines several approaches. Combination can be done in two ways:1. Hybrid matcher. - integrates multiple matching criteria .2. Composite matchers.- combine the results of independently executed matchers. 27
Sample Approaches From the Literature LSD. SKAT. TransScm. ARTEMIS. 28
Learning Source Descriptions(LSD) . 29
 Semantic Knowledge Articulation Tool(SKAT)  A rule-based approach to semi-automatically determine matches between schemas. Rules are formulated in first-order logic to express match and mismatch relationships The user has to initially provide match and mismatch relationships then approve or reject generated matches. Schemas are transformed into a graph-based object-oriented database model. 30
TransScm Input schemas are transformed into labeled graphs. Edges in the schema graphs represent component relationships. The matching is performed node by node (element-level, 1:1) There are several matchers which are checked in a fixed order. If no match is found or if a matcher determines multiple match candidates, user intervention is required.(provide a rule or select a match candidate. ) 31
ARTEMIS It first computes “affinities” in the range 0 to 1 between attributes.1.Name affinity.2.Data Type affinity.3.Struct affinity. Then completes the schema integration by clustering attributes based on those affinities and then constructing views based on the clusters. 32
Characteristics of Proposed Schema Match Approaches 33
Characteristics of Proposed Schema Match Approaches(cont.) 34
Characteristics of Proposed Schema Match Approaches(cont.) 35
Characteristics of Proposed Schema Match Approaches(cont.) 36
Conclusion We used the taxonomy to characterize and compare a variety of previous match implementations. We hope that the taxonomy will be useful to programmers who need to implement a match algorithm. 37

Mais conteúdo relacionado

Mais procurados

Cs583 information-integration
Cs583 information-integrationCs583 information-integration
Cs583 information-integrationBorseshweta
 
Query Processing, Query Optimization and Transaction
Query Processing, Query Optimization and TransactionQuery Processing, Query Optimization and Transaction
Query Processing, Query Optimization and TransactionPrabu U
 
Object and class relationships
Object and class relationshipsObject and class relationships
Object and class relationshipsPooja mittal
 
ER Modeling and Introduction to RDBMS
ER Modeling and Introduction to RDBMSER Modeling and Introduction to RDBMS
ER Modeling and Introduction to RDBMSRubal Sagwal
 
classes & objects introduction
classes & objects introductionclasses & objects introduction
classes & objects introductionKumar
 
Chapter 6 relational data model and relational
Chapter  6  relational data model and relationalChapter  6  relational data model and relational
Chapter 6 relational data model and relationalJafar Nesargi
 
Class diagram- UML diagram
Class diagram- UML diagramClass diagram- UML diagram
Class diagram- UML diagramRamakant Soni
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database DesignPrabu U
 
Uml Presentation
Uml PresentationUml Presentation
Uml Presentationmewaseem
 
Unit 2(advanced class modeling & state diagram)
Unit  2(advanced class modeling & state diagram)Unit  2(advanced class modeling & state diagram)
Unit 2(advanced class modeling & state diagram)Manoj Reddy
 
data abstraction ,encapsulation,A.D.T
data abstraction ,encapsulation,A.D.Tdata abstraction ,encapsulation,A.D.T
data abstraction ,encapsulation,A.D.Tkapil10197
 

Mais procurados (20)

Cs583 information-integration
Cs583 information-integrationCs583 information-integration
Cs583 information-integration
 
Query Processing, Query Optimization and Transaction
Query Processing, Query Optimization and TransactionQuery Processing, Query Optimization and Transaction
Query Processing, Query Optimization and Transaction
 
Design patterns
Design patternsDesign patterns
Design patterns
 
class diagram
class diagramclass diagram
class diagram
 
Object and class relationships
Object and class relationshipsObject and class relationships
Object and class relationships
 
ER Modeling and Introduction to RDBMS
ER Modeling and Introduction to RDBMSER Modeling and Introduction to RDBMS
ER Modeling and Introduction to RDBMS
 
classes & objects introduction
classes & objects introductionclasses & objects introduction
classes & objects introduction
 
Uml class-diagram
Uml class-diagramUml class-diagram
Uml class-diagram
 
Chapter 6 relational data model and relational
Chapter  6  relational data model and relationalChapter  6  relational data model and relational
Chapter 6 relational data model and relational
 
Class diagram- UML diagram
Class diagram- UML diagramClass diagram- UML diagram
Class diagram- UML diagram
 
Uml class Diagram
Uml class DiagramUml class Diagram
Uml class Diagram
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database Design
 
Line Plots
Line PlotsLine Plots
Line Plots
 
Uml Presentation
Uml PresentationUml Presentation
Uml Presentation
 
Value Added
Value AddedValue Added
Value Added
 
Unit 2(advanced class modeling & state diagram)
Unit  2(advanced class modeling & state diagram)Unit  2(advanced class modeling & state diagram)
Unit 2(advanced class modeling & state diagram)
 
Chapter 3 Entity Relationship Model
Chapter 3 Entity Relationship ModelChapter 3 Entity Relationship Model
Chapter 3 Entity Relationship Model
 
Chapter3
Chapter3Chapter3
Chapter3
 
data abstraction ,encapsulation,A.D.T
data abstraction ,encapsulation,A.D.Tdata abstraction ,encapsulation,A.D.T
data abstraction ,encapsulation,A.D.T
 
Types of UML diagrams
Types of UML diagramsTypes of UML diagrams
Types of UML diagrams
 

Semelhante a 20100810

Lec2_Information Integration.ppt
 Lec2_Information Integration.ppt Lec2_Information Integration.ppt
Lec2_Information Integration.pptNaglaaFathy42
 
semantic integration.ppt
semantic integration.pptsemantic integration.ppt
semantic integration.pptNaglaaFathy42
 
Schema Integration, View Integration and Database Integration, ER Model & Dia...
Schema Integration, View Integration and Database Integration, ER Model & Dia...Schema Integration, View Integration and Database Integration, ER Model & Dia...
Schema Integration, View Integration and Database Integration, ER Model & Dia...Mobarok Hossen
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Infrrd
 
Automated Correlation Discovery for Semi-Structured Business Processes
Automated Correlation Discovery for Semi-Structured Business ProcessesAutomated Correlation Discovery for Semi-Structured Business Processes
Automated Correlation Discovery for Semi-Structured Business ProcessesSzabolcs Rozsnyai
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
 
Annotating Search Results from Web Databases
Annotating Search Results from Web DatabasesAnnotating Search Results from Web Databases
Annotating Search Results from Web DatabasesSWAMI06
 
Dbms ii mca-ch4-relational model-2013
Dbms ii mca-ch4-relational model-2013Dbms ii mca-ch4-relational model-2013
Dbms ii mca-ch4-relational model-2013Prosanta Ghosh
 
Biperpedia: An ontology of Search Application
Biperpedia: An ontology of Search ApplicationBiperpedia: An ontology of Search Application
Biperpedia: An ontology of Search ApplicationHarsh Kevadia
 
Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Mohit Sngg
 
COMPUTERS Database
COMPUTERS Database COMPUTERS Database
COMPUTERS Database Rc Os
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And ClusteringDataminingTools Inc
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And ClusteringDatamining Tools
 

Semelhante a 20100810 (20)

Lec2_Information Integration.ppt
 Lec2_Information Integration.ppt Lec2_Information Integration.ppt
Lec2_Information Integration.ppt
 
semantic integration.ppt
semantic integration.pptsemantic integration.ppt
semantic integration.ppt
 
ppt
pptppt
ppt
 
Schema Integration, View Integration and Database Integration, ER Model & Dia...
Schema Integration, View Integration and Database Integration, ER Model & Dia...Schema Integration, View Integration and Database Integration, ER Model & Dia...
Schema Integration, View Integration and Database Integration, ER Model & Dia...
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
 
Automated Correlation Discovery for Semi-Structured Business Processes
Automated Correlation Discovery for Semi-Structured Business ProcessesAutomated Correlation Discovery for Semi-Structured Business Processes
Automated Correlation Discovery for Semi-Structured Business Processes
 
Rdbms
RdbmsRdbms
Rdbms
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
 
Annotating Search Results from Web Databases
Annotating Search Results from Web DatabasesAnnotating Search Results from Web Databases
Annotating Search Results from Web Databases
 
Keyword query routing
Keyword query routingKeyword query routing
Keyword query routing
 
Dbms ii mca-ch4-relational model-2013
Dbms ii mca-ch4-relational model-2013Dbms ii mca-ch4-relational model-2013
Dbms ii mca-ch4-relational model-2013
 
Two Layered HMMs for Search Interface Segmentation
Two Layered HMMs for Search Interface SegmentationTwo Layered HMMs for Search Interface Segmentation
Two Layered HMMs for Search Interface Segmentation
 
Biperpedia: An ontology of Search Application
Biperpedia: An ontology of Search ApplicationBiperpedia: An ontology of Search Application
Biperpedia: An ontology of Search Application
 
Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Annotating Search Results from Web Databases
Annotating Search Results from Web Databases
 
Summary2 (1)
Summary2 (1)Summary2 (1)
Summary2 (1)
 
COMPUTERS Database
COMPUTERS Database COMPUTERS Database
COMPUTERS Database
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 

20100810

  • 1. A Survey of Approaches to Automatic Schema Matching Erhard Rahm Philip A. Bernstein VLDB  2001 1
  • 2. Introduction Schema means representation of data. Schema matching is a basic problem in many database application domains. We present a taxonomy that covers many of these existing approaches. 2
  • 3. Match Match, which takes two schemas as input and produces a mapping between elements of the two schemas that correspond semantically to each other. 3
  • 4. Mapping(cont.) A mapping element Cust.C# to Customer.CustID Expression =>“Cust.C# = Customer.CustID”. Concatenate(Cust.FirstName, Cust.LastName) = Customer.Contact” 4
  • 5. Application Domains Schema integration. Data warehouses. E-commerce. Semantic query processing. 5
  • 6. Architecture for Generic Match(cont.) 6
  • 7. Classification of Schema Matching Approaches Overview 7
  • 8. Classification of Schema Matching Approaches For individual matchers, we consider the following largely-orthogonal classification criteria:1. Instance vs schema: matching material are from instance or schema.2. Element vs structure:match for individual schema elements, such as attributes, or for combinations of elements, such as complex schema structures. 8
  • 9. Classification of Schema Matching Approaches(cont.) 3. Language vs constraint: -linguistic-based approach based on names and textual descriptions -constraint-based approach based on keys and relationships. 4. Matching cardinality:each mapping element may interrelate one or more elements of the two schemas. 5. Auxiliary information: such as dictionaries, global schemas, previous matching decisions, and user input. 9
  • 10. Classification of Schema Matching Approaches Overview 10
  • 11. Schema-Level Matchers Only consider schema information, such as -Name.-Description.-Data type.-Relationship types (part-of, is-a, etc.).-Constraints.-Schema structure. 11
  • 12. Classification of Schema Matching Approaches Overview 12
  • 13. Granularity of Match Element-levelvsStructure-level. Element-level: -match elements at the atomic level, such as attributes in an XML schema. Structure-level: -matching combinations of elements that appear together in a structure. 13
  • 15. Classification of Schema Matching Approaches Overview 15
  • 16. Linguistic Approaches Language-based or linguistic matchers use names and text to find semantically similar schema elements. We discuss two schema-level approaches -Name matching. -Description matching. 16
  • 17. Name Matching Name-based matching matches schema elements with equal or similar names. Similarity of names can be defined and measured in various ways:1. Equality of names. - Homonyms ex: “line” of business vs “line” of order.2. Equality of canonical name.CName -> customer name.EmpNO ->employee number.3. Equality of synonyms.car ∼ automobile. mark ∼ brand. 17
  • 18. Name Matching (cont.) 4. Equality of hypernyms.book is-a publication and article is-a publication imply book∼publication, article∼publication, and book∼article. 5. Similarity of names based pronunciation. ShipTo ∼ = Ship2 .6. User-provided name matches. reportsTo ∼ manager. issue ∼ bug. 18
  • 19. Description Matching Description are used to express the intended semantics of schema elements.eg: S1: empn // employee name. S2: name // name of employee. 19
  • 20. Classification of Schema Matching Approaches Overview 20
  • 21. Constraint-based Approaches If input schemas contain such information, it can be used by a matcher to determine the similarity of schema elements. Schemas often contain constraints to define-data types.-value ranges.-uniqueness.-optionality.-relationship types and so on. 21
  • 22. Constraint-based Approaches(cont.) Type and key information suggest that Born matches Birthdate and Pnomatches either EmpNo or DeptNo. 22
  • 23. Auxiliary Information Auxiliary Information:1.Dictionaries.2.Thesauri.3.User-provided information .can improve our matching process. Reuse the matched schemas. 23
  • 24. Reusing Schema and Mapping Information(cont.) 24
  • 25. Instance-Level Approaches Instance-level has two approaches:1. To enhance the effectiveness of schema- level matching. 2. To perform instance-level matching on its own. Most of the approaches discussed previously for schema-level matching can be applied to instance-level matching. 25
  • 26. Instance-Level Approaches(cont.) DeptName is a better match candidate for Dept than EmpName. Take EmpNo, DeptNoandPno as example. Based on similar value ranges ,we match Pnoto EmpNo rather than DeptNo. 26
  • 27. Combining Different Matchers A matcher that uses just one approach is unlikely to achieve as many good match candidates as one that combines several approaches. Combination can be done in two ways:1. Hybrid matcher. - integrates multiple matching criteria .2. Composite matchers.- combine the results of independently executed matchers. 27
  • 28. Sample Approaches From the Literature LSD. SKAT. TransScm. ARTEMIS. 28
  • 30. Semantic Knowledge Articulation Tool(SKAT) A rule-based approach to semi-automatically determine matches between schemas. Rules are formulated in first-order logic to express match and mismatch relationships The user has to initially provide match and mismatch relationships then approve or reject generated matches. Schemas are transformed into a graph-based object-oriented database model. 30
  • 31. TransScm Input schemas are transformed into labeled graphs. Edges in the schema graphs represent component relationships. The matching is performed node by node (element-level, 1:1) There are several matchers which are checked in a fixed order. If no match is found or if a matcher determines multiple match candidates, user intervention is required.(provide a rule or select a match candidate. ) 31
  • 32. ARTEMIS It first computes “affinities” in the range 0 to 1 between attributes.1.Name affinity.2.Data Type affinity.3.Struct affinity. Then completes the schema integration by clustering attributes based on those affinities and then constructing views based on the clusters. 32
  • 33. Characteristics of Proposed Schema Match Approaches 33
  • 34. Characteristics of Proposed Schema Match Approaches(cont.) 34
  • 35. Characteristics of Proposed Schema Match Approaches(cont.) 35
  • 36. Characteristics of Proposed Schema Match Approaches(cont.) 36
  • 37. Conclusion We used the taxonomy to characterize and compare a variety of previous match implementations. We hope that the taxonomy will be useful to programmers who need to implement a match algorithm. 37