SlideShare uma empresa Scribd logo
1 de 16
Mustafa Jarrar
Lecture Notes, Web Data Management (MCOM7348)
University of Birzeit, Palestine
1st Semester, 2013

Introduction to Data Integration

Dr. Mustafa Jarrar
University of Birzeit
mjarrar@birzeit.edu
www.jarrar.info
Jarrar © 2013

1
Watch this lecture and download the slides from
http://jarrar-courses.blogspot.com/2013/11/web-data-management.html

Jarrar © 2013

2
Example from the government Domain
Consider all interactions with government agencies in order
to register a new business in Palestine.
Example: Establishing a new Radio Station.

Ministry of
Telecom

Ministry of
Information

Ministry of
National Economy

Jarrar © 2013

Ministry of
Finance

Chamber of
Commerce

3
Example from the government Domain
Consider when the business evolves or changes.
Example: Changing the address of the radio station.
–  Address must be changed in 5 different databases.

Ministry of
Telecom

Ministry of
Information

Ministry of
National Economy

Jarrar © 2013

Ministry of
Finance

Chamber of
Commerce

4
Example from the government Domain
Consider the data registered about the same radio station in
the databases of different ministries and governmental
agencies:

ID

Agency 3

R2563I

Radio Al-Amal

Radio Station Ramallah

Business Name

Activity Type

Province

LM1847

Al-Amal
Broadcast

Radio
Broadcasting

Ramallah
and Bireh

ID

Agency 2

Type

B_ID

Agency 1

Name

Company Name

Company Type

Location

182NS3

Broadcast AlAmal

Broadcasting
Station

Al-Balu’

...

Jarrar © 2013

City

5
Example from the government Domain
From our simple example one can point out to some
challenges in Data Integration:
–  No agreed upon naming (name, business name, company name)
–  No agreed upon meaning (Does ’Activity Type’ mean exactly the
same as ‘Company Type’?)
–  Different Registered Data: Radio Al-Amal, Al-Amal Broadcast, ….
ID

Agency 3

R2563I

Radio Al-Amal

Radio Station Ramallah

Business Name

Activity Type

Province

LM1847

Al-Amal
Broadcast

Radio
Broadcasting

Ramallah
and Bireh

ID

Agency 2

Type

B_ID

Agency 1

Name

Company Name

Company Type

Location

182NS3

Broadcast AlAmal

Broadcasting
Station

Al-Balu’

...

Jarrar © 2013

City

6
Problem is in all domains

Jarrar © 2013

7
Problem is in all domains
Problem is now even more challenging with the Web.
The Data Web envisions the web as a global world-wide
database.
This means that one can query distributed multiple databases
on the web as if he/she is querying a local database.

Jarrar © 2013

8
Challenges of Data Integration:
Heterogeneities in Database Schemas
One can distinguish between several heterogeneities
between different schemas:
–  Name Heterogeneities (difference in used vocabulary).
–  Meaning Heterogeneities (different meaning for the same attribute
in two schemas).
–  Heterogeneities in the structure and type.
–  Heterogeneities in the rules and constraints.
–  Data Model Heterogeneities.

Jarrar © 2013

9
Name and Meaning Heterogeneities
Synonyms – Different names for the same concepts
–  employee, clerk
–  exam, course
–  code, num

Homonyms – Same name for different concepts (different
meanings)
- City as City of birth in one schema,
- City as City of Residence in another schema
Saraly: Net Salary

Section

Salary: Gross Salary

Division

Homonyms

A specialized
division of a
large
organization

Synonyms
Jarrar © 2013

10
Heterogeneities in Structure and Type
Source: Carlo Batini

The same concepts are represented with
different conceptual structures in two schemas:
–  Attribute in one schema and derived value in another schema.
–  Attribute in one schema and entity in another schema.
–  Entity in one schema and relationship in another schema.
–  Different abstraction levels for the same concept in two schemas:
e.g. two entities with homonym names related by an IS-A hierarchy
in two schemas.

Jarrar © 2013

11
Heterogeneities in Structure
Source: Carlo Batini

EXAMPLES:
EMPLOYEE

Person
MAN

Person

GENDER

EMPLOYEE

DEPARTMENT

PROJECT

WOMAN
PROJECT

BOOK

BOOK

PUBLISHER

PUBLISHER

Jarrar © 2013

12
Heterogeneities in Type
Examples:
§  In a single attribute (e.g., Numberic, Alphanumeric).
E.g., the attribute “gender”:
–  Male/Female
–  M/F
–  0/1
§  Year has a four digit domain in one schema and two digit domain
in another schema

§  Different currencies (Euros, US Dollars, etc.)
§  Different measure systems (kilos vs. pounds,
centigrade vs. Fahrenheit.)
§  Different granularities (grams, kilos, etc.)
Jarrar © 2013

13
Heterogeneities in the rules and constraints
Source: Carlo Batini

EXAMPLES:
–  Different cardinalities in the same relationships
–  Key conflicts

Jarrar © 2013

14
Model Heterogeneities
Model Heterogeneities occurs when different databases adheres to
different data models:
–  Relational Data Model, XML, RDF, Object-Oriented, OWL, ...

Solution: Reduce Model Heterogeneity by using one data model.
Example: Convert the Relational Model to RDF graph model.

Jarrar © 2013

15
References and Acknowledgement
•  Carlo Batini: Course on Data Integration. BZU IT Summer School
2011.
•  Stefano Spaccapietra: Information Integration. Presentation at the IFIP
Academy. Porto Alegre. 2005.
•  Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI
International, Artificial Intelligence Center. Menlo Park, USA. 2009.

Thanks to Anton Deik for helping me preparing this lecture

Jarrar © 2013

16

Mais conteúdo relacionado

Destaque

Chapter12 designing databases
Chapter12 designing databasesChapter12 designing databases
Chapter12 designing databasesDhani Ahmad
 
Jarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFJarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFMustafa Jarrar
 
Jarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineJarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineMustafa Jarrar
 
Jarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsJarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsMustafa Jarrar
 
Jarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data IntegrationJarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data IntegrationMustafa Jarrar
 
Jarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and ConstraintsJarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and ConstraintsMustafa Jarrar
 
Jarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageJarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageMustafa Jarrar
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql ProjectMustafa Jarrar
 
Jarrar: RDF Stores: Challenges and Solutions
Jarrar: RDF Stores: Challenges and SolutionsJarrar: RDF Stores: Challenges and Solutions
Jarrar: RDF Stores: Challenges and SolutionsMustafa Jarrar
 
Jarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDFJarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDFMustafa Jarrar
 
Jarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF SchemaJarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF SchemaMustafa Jarrar
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web VesionJarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web VesionMustafa Jarrar
 
Jarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology LanguageJarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology LanguageMustafa Jarrar
 
Jarrar: RDF Stores -Challenges and Solutions
Jarrar: RDF Stores -Challenges and SolutionsJarrar: RDF Stores -Challenges and Solutions
Jarrar: RDF Stores -Challenges and SolutionsMustafa Jarrar
 
Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)Mustafa Jarrar
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebJarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebMustafa Jarrar
 
Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps Mustafa Jarrar
 

Destaque (20)

Chapter12 designing databases
Chapter12 designing databasesChapter12 designing databases
Chapter12 designing databases
 
Jarrar: Zinnar
Jarrar: ZinnarJarrar: Zinnar
Jarrar: Zinnar
 
Jarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDFJarrar: Data Integration and Fusion using RDF
Jarrar: Data Integration and Fusion using RDF
 
Jarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineJarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course Outline
 
Jarrar: Linked Data
Jarrar: Linked DataJarrar: Linked Data
Jarrar: Linked Data
 
Jarrar: Web 2 Data Mashups
Jarrar: Web 2 Data MashupsJarrar: Web 2 Data Mashups
Jarrar: Web 2 Data Mashups
 
Jarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data IntegrationJarrar: Architectural Solutions in Data Integration
Jarrar: Architectural Solutions in Data Integration
 
Jarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and ConstraintsJarrar: Subtype Relations and Constraints
Jarrar: Subtype Relations and Constraints
 
Jarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query LanguageJarrar: SPARQL - RDF Query Language
Jarrar: SPARQL - RDF Query Language
 
Jarrar: Sparql Project
Jarrar: Sparql ProjectJarrar: Sparql Project
Jarrar: Sparql Project
 
Jarrar: RDF Stores: Challenges and Solutions
Jarrar: RDF Stores: Challenges and SolutionsJarrar: RDF Stores: Challenges and Solutions
Jarrar: RDF Stores: Challenges and Solutions
 
Jarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDFJarrar: Data Fusion using RDF
Jarrar: Data Fusion using RDF
 
Jarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF SchemaJarrar: RDFs -RDF Schema
Jarrar: RDFs -RDF Schema
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web VesionJarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
Jarrar: The Next Generation of the Web 3.0: The Semantic Web Vesion
 
Jarrar: RDFa
Jarrar: RDFaJarrar: RDFa
Jarrar: RDFa
 
Jarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology LanguageJarrar: OWL -Web Ontology Language
Jarrar: OWL -Web Ontology Language
 
Jarrar: RDF Stores -Challenges and Solutions
Jarrar: RDF Stores -Challenges and SolutionsJarrar: RDF Stores -Challenges and Solutions
Jarrar: RDF Stores -Challenges and Solutions
 
Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)Jarrar: OWL (Web Ontology Language)
Jarrar: OWL (Web Ontology Language)
 
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic WebJarrar: The Next Generation of the Web 3.0: The Semantic Web
Jarrar: The Next Generation of the Web 3.0: The Semantic Web
 
Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps Jarrar: Conceptual Schema Design Steps
Jarrar: Conceptual Schema Design Steps
 

Semelhante a Jarrar: Introduction to Data Integration

Jarrar: Introduction to data Integration
Jarrar: Introduction to data IntegrationJarrar: Introduction to data Integration
Jarrar: Introduction to data IntegrationMustafa Jarrar
 
Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema IntegrationMustafa Jarrar
 
Jarrar: Data Schema Integration
Jarrar: Data Schema Integration Jarrar: Data Schema Integration
Jarrar: Data Schema Integration Mustafa Jarrar
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Jarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringJarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringMustafa Jarrar
 
Legal Technology - State Bar of CA - Solo/Small Firm Summit
Legal Technology - State Bar of CA - Solo/Small Firm SummitLegal Technology - State Bar of CA - Solo/Small Firm Summit
Legal Technology - State Bar of CA - Solo/Small Firm SummitRon Dolin
 
Jarrar: Introduction to Ontology
Jarrar: Introduction to OntologyJarrar: Introduction to Ontology
Jarrar: Introduction to OntologyMustafa Jarrar
 
All authors contributed equally.An Analysis of Categoric
 All authors contributed equally.An Analysis of Categoric All authors contributed equally.An Analysis of Categoric
All authors contributed equally.An Analysis of CategoricMargaritoWhitt221
 

Semelhante a Jarrar: Introduction to Data Integration (8)

Jarrar: Introduction to data Integration
Jarrar: Introduction to data IntegrationJarrar: Introduction to data Integration
Jarrar: Introduction to data Integration
 
Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema Integration
 
Jarrar: Data Schema Integration
Jarrar: Data Schema Integration Jarrar: Data Schema Integration
Jarrar: Data Schema Integration
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Jarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology EngineeringJarrar: Logical Foundation of Ontology Engineering
Jarrar: Logical Foundation of Ontology Engineering
 
Legal Technology - State Bar of CA - Solo/Small Firm Summit
Legal Technology - State Bar of CA - Solo/Small Firm SummitLegal Technology - State Bar of CA - Solo/Small Firm Summit
Legal Technology - State Bar of CA - Solo/Small Firm Summit
 
Jarrar: Introduction to Ontology
Jarrar: Introduction to OntologyJarrar: Introduction to Ontology
Jarrar: Introduction to Ontology
 
All authors contributed equally.An Analysis of Categoric
 All authors contributed equally.An Analysis of Categoric All authors contributed equally.An Analysis of Categoric
All authors contributed equally.An Analysis of Categoric
 

Último

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxNikitaBankoti2
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 

Último (20)

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

Jarrar: Introduction to Data Integration

  • 1. Mustafa Jarrar Lecture Notes, Web Data Management (MCOM7348) University of Birzeit, Palestine 1st Semester, 2013 Introduction to Data Integration Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info Jarrar © 2013 1
  • 2. Watch this lecture and download the slides from http://jarrar-courses.blogspot.com/2013/11/web-data-management.html Jarrar © 2013 2
  • 3. Example from the government Domain Consider all interactions with government agencies in order to register a new business in Palestine. Example: Establishing a new Radio Station. Ministry of Telecom Ministry of Information Ministry of National Economy Jarrar © 2013 Ministry of Finance Chamber of Commerce 3
  • 4. Example from the government Domain Consider when the business evolves or changes. Example: Changing the address of the radio station. –  Address must be changed in 5 different databases. Ministry of Telecom Ministry of Information Ministry of National Economy Jarrar © 2013 Ministry of Finance Chamber of Commerce 4
  • 5. Example from the government Domain Consider the data registered about the same radio station in the databases of different ministries and governmental agencies: ID Agency 3 R2563I Radio Al-Amal Radio Station Ramallah Business Name Activity Type Province LM1847 Al-Amal Broadcast Radio Broadcasting Ramallah and Bireh ID Agency 2 Type B_ID Agency 1 Name Company Name Company Type Location 182NS3 Broadcast AlAmal Broadcasting Station Al-Balu’ ... Jarrar © 2013 City 5
  • 6. Example from the government Domain From our simple example one can point out to some challenges in Data Integration: –  No agreed upon naming (name, business name, company name) –  No agreed upon meaning (Does ’Activity Type’ mean exactly the same as ‘Company Type’?) –  Different Registered Data: Radio Al-Amal, Al-Amal Broadcast, …. ID Agency 3 R2563I Radio Al-Amal Radio Station Ramallah Business Name Activity Type Province LM1847 Al-Amal Broadcast Radio Broadcasting Ramallah and Bireh ID Agency 2 Type B_ID Agency 1 Name Company Name Company Type Location 182NS3 Broadcast AlAmal Broadcasting Station Al-Balu’ ... Jarrar © 2013 City 6
  • 7. Problem is in all domains Jarrar © 2013 7
  • 8. Problem is in all domains Problem is now even more challenging with the Web. The Data Web envisions the web as a global world-wide database. This means that one can query distributed multiple databases on the web as if he/she is querying a local database. Jarrar © 2013 8
  • 9. Challenges of Data Integration: Heterogeneities in Database Schemas One can distinguish between several heterogeneities between different schemas: –  Name Heterogeneities (difference in used vocabulary). –  Meaning Heterogeneities (different meaning for the same attribute in two schemas). –  Heterogeneities in the structure and type. –  Heterogeneities in the rules and constraints. –  Data Model Heterogeneities. Jarrar © 2013 9
  • 10. Name and Meaning Heterogeneities Synonyms – Different names for the same concepts –  employee, clerk –  exam, course –  code, num Homonyms – Same name for different concepts (different meanings) - City as City of birth in one schema, - City as City of Residence in another schema Saraly: Net Salary Section Salary: Gross Salary Division Homonyms A specialized division of a large organization Synonyms Jarrar © 2013 10
  • 11. Heterogeneities in Structure and Type Source: Carlo Batini The same concepts are represented with different conceptual structures in two schemas: –  Attribute in one schema and derived value in another schema. –  Attribute in one schema and entity in another schema. –  Entity in one schema and relationship in another schema. –  Different abstraction levels for the same concept in two schemas: e.g. two entities with homonym names related by an IS-A hierarchy in two schemas. Jarrar © 2013 11
  • 12. Heterogeneities in Structure Source: Carlo Batini EXAMPLES: EMPLOYEE Person MAN Person GENDER EMPLOYEE DEPARTMENT PROJECT WOMAN PROJECT BOOK BOOK PUBLISHER PUBLISHER Jarrar © 2013 12
  • 13. Heterogeneities in Type Examples: §  In a single attribute (e.g., Numberic, Alphanumeric). E.g., the attribute “gender”: –  Male/Female –  M/F –  0/1 §  Year has a four digit domain in one schema and two digit domain in another schema §  Different currencies (Euros, US Dollars, etc.) §  Different measure systems (kilos vs. pounds, centigrade vs. Fahrenheit.) §  Different granularities (grams, kilos, etc.) Jarrar © 2013 13
  • 14. Heterogeneities in the rules and constraints Source: Carlo Batini EXAMPLES: –  Different cardinalities in the same relationships –  Key conflicts Jarrar © 2013 14
  • 15. Model Heterogeneities Model Heterogeneities occurs when different databases adheres to different data models: –  Relational Data Model, XML, RDF, Object-Oriented, OWL, ... Solution: Reduce Model Heterogeneity by using one data model. Example: Convert the Relational Model to RDF graph model. Jarrar © 2013 15
  • 16. References and Acknowledgement •  Carlo Batini: Course on Data Integration. BZU IT Summer School 2011. •  Stefano Spaccapietra: Information Integration. Presentation at the IFIP Academy. Porto Alegre. 2005. •  Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI International, Artificial Intelligence Center. Menlo Park, USA. 2009. Thanks to Anton Deik for helping me preparing this lecture Jarrar © 2013 16