SlideShare uma empresa Scribd logo
1 de 10
What is data inconsistency? 
Data inconsistency exists when different and conflicting versions of 
the same data appear in different places. Data inconsistency creates 
unreliable information, because it will be difficult to determine which 
version of the information is correct. (It's difficult to make correct - 
and timely - decisions if those decisions are based on conflicting 
information.) 
Data inconsistency is likely to occur when there is data redundancy. 
Data redundancy occurs when the data file/database file contains 
redundant - unnecessarily duplicated - data. That's why one major goal 
of good database design is to eliminate data redundancy.
Data Validation?
Contd…… 
• Data validation is intended to provide certain 
well-defined guarantees for fitness, accuracy, and 
consistency for any of various kinds of user input 
into an application or automated system. Data 
validation rules can be defined and designed 
using any of various methodologies, and be 
deployed in any of various contexts. 
• Data validation rules may be defined, designed 
and deployed in different ways according to 
requirement.
Contd…… 
Figure 1: Simple schematic for a data warehouse. The ETL process extracts information from 
the source databases, transforms it and then loads it into the data warehouse.
Contd…. 
• In evaluating the basics of data validation, 
generalizations can be made regarding the different 
types of validation, according to the scope, complexity, 
and purpose of the various validation operations to be 
carried out. 
• For example: 
• Data type validation; 
• Range and constraint validation; 
• Code and Cross-reference validation- such as one or 
more tests against regular expressions; and 
• Structured validation-- LDAP.
Data Integration in Data Warehousing 
Generally speaking, a data integration system combines the data residing at 
different sources, and provides a unified, reconciled view of these data, called global 
schema, which can be queried by the user. In the design of a data integration system, 
an important aspect is the way in which the global schema is specified, i.e., which 
data model is adopted and what kind of constraints on the data can be expressed. 
Moreover, a basic decision is related to the problem of how to specify the relation 
between the sources and the global schema. There are basically two approaches for 
this problem. The first approach, called global-as view (GAV), requires that the 
global schema is expressed in terms of the data sources. More precisely, to every 
concept of the global schema, a view over the data sources is associated, so that its 
meaning is specified in terms of the data residing at the sources. In the second 
approach, called local-as-view (LAV), the global schema is specified independently 
from the sources, and the relationships between the global schema and the 
sources are established by defining every source as a view over the global schema.
Problems in data integration 
• Inconsistencies 
• Redundancies
METADATA IN THE DATA WAREHOUSE 
• Metadata is simply defined as data about data. The data that are used to 
represent other data is known as metadata. For example the index of a 
book serve as metadata for the contents in the book. In other words we 
can say that metadata is the summarized data that leads us to the 
detailed data. 
• Categories of Metadata 
• The metadata can be broadly categorized into three categories: 
• Business Metadata - This metadata has the data ownership information, 
business definition and changing policies. 
• Technical Metadata - Technical metadata includes database system 
names, table and column names and sizes, data types and allowed 
values. Technical metadata also includes structural information such as 
primary and foreign key attributes and indices. 
• Operational Metadata - This metadata includes currency of data and 
data lineage.Currency of data means whether data is active, archived or 
purged. Lineage of data means history of data migrated and 
transformation applied on it.
Mapping 
• A basic part of the data warehouse environment 
is that of mapping from the operational 
• environment into the data warehouse. The 
mapping includes a wide variety of facets, 
• including, but not limited to: 
• • mapping from one attribute to another, 
• • conversions, 
• • changes in naming conventions, 
• • changes in physical characteristics of data, 
• • filtering of data, etc.
Retention and Purging? 
• What is data retention and Purging? 
• There are certain requirement to purge / archive / delete the data in data 
warehouse after a certain period of time, often termed as retention period of the 
data warehouse. Once a the retention period is reached, data from the data 
warehouse are purged or deleted or archived into separate place usually 
comprising of low cost storage medium (e.g. tape drive). 
• Why data purging is required? 
• In a idealistic scenario, we assume data warehouse to store data for good. 
However there are some reasons why this might not be a good idea in a real 
scenario: 
• There are cost overhead associated with the amount of data that we store. This 
includes the cost of storage medium, infrastructure and human resources 
necessary to manage the data 
• There is direct impact of data volume to the performance of a data warehouse. 
More data means more time consuming sorting and searching operations 
• End users of the data warehouse in the business side may not be interested in the 
very old fact and figures. Data might lose its importance and relevance with the 
changing business landscape. Retaining such impertinent data may not be required

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

data warehousing
data warehousingdata warehousing
data warehousing
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
Data mining
Data miningData mining
Data mining
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and Implementation
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural Framework
 
Ghhh
GhhhGhhh
Ghhh
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Meta Data and it's Type
Meta Data and it's TypeMeta Data and it's Type
Meta Data and it's Type
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Metadata
MetadataMetadata
Metadata
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 
Data management new
Data management newData management new
Data management new
 
Database and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business IntelligenceDatabase and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business Intelligence
 
Data Cleaning
Data CleaningData Cleaning
Data Cleaning
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Data mining
Data miningData mining
Data mining
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 

Destaque

Inconsistencies in big data
Inconsistencies in big dataInconsistencies in big data
Inconsistencies in big dataminujoseph
 
Towards Inconsistency Tolerance by Quantification of Semantic Inconsistencies
Towards Inconsistency Tolerance by Quantification of Semantic InconsistenciesTowards Inconsistency Tolerance by Quantification of Semantic Inconsistencies
Towards Inconsistency Tolerance by Quantification of Semantic InconsistenciesIstván Dávid
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonData Con LA
 
Data pre processing
Data pre processingData pre processing
Data pre processingjunnubabu
 
Big Data
Big DataBig Data
Big DataNGDATA
 

Destaque (8)

Inconsistencies in big data
Inconsistencies in big dataInconsistencies in big data
Inconsistencies in big data
 
Towards Inconsistency Tolerance by Quantification of Semantic Inconsistencies
Towards Inconsistency Tolerance by Quantification of Semantic InconsistenciesTowards Inconsistency Tolerance by Quantification of Semantic Inconsistencies
Towards Inconsistency Tolerance by Quantification of Semantic Inconsistencies
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
03. Data Preprocessing
03. Data Preprocessing03. Data Preprocessing
03. Data Preprocessing
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Big Data
Big DataBig Data
Big Data
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Semelhante a What is Data Inconsistency and How to Prevent It

data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...aasifkuchey85
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)Krishan Pareek
 
Data Catalog as a Business Enabler
Data Catalog as a Business EnablerData Catalog as a Business Enabler
Data Catalog as a Business EnablerSrinivasan Sankar
 
Chapter 4 Organizational Aspects of Data Management.ppt
Chapter 4 Organizational Aspects of Data Management.pptChapter 4 Organizational Aspects of Data Management.ppt
Chapter 4 Organizational Aspects of Data Management.pptAnasSamara3
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Management information system database management
Management information system database managementManagement information system database management
Management information system database managementOnline
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxParnalSatle
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 

Semelhante a What is Data Inconsistency and How to Prevent It (20)

MS-CIT Unit 9.pptx
MS-CIT Unit 9.pptxMS-CIT Unit 9.pptx
MS-CIT Unit 9.pptx
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Ch~2.pdf
 
data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Ch_2.pdf
Ch_2.pdfCh_2.pdf
Ch_2.pdf
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data Catalog as a Business Enabler
Data Catalog as a Business EnablerData Catalog as a Business Enabler
Data Catalog as a Business Enabler
 
Chapter 4 Organizational Aspects of Data Management.ppt
Chapter 4 Organizational Aspects of Data Management.pptChapter 4 Organizational Aspects of Data Management.ppt
Chapter 4 Organizational Aspects of Data Management.ppt
 
Data Mining
Data MiningData Mining
Data Mining
 
Chapter 6.pptx
Chapter 6.pptxChapter 6.pptx
Chapter 6.pptx
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Management information system database management
Management information system database managementManagement information system database management
Management information system database management
 
9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx
 
BD1.pptx
BD1.pptxBD1.pptx
BD1.pptx
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

What is Data Inconsistency and How to Prevent It

  • 1. What is data inconsistency? Data inconsistency exists when different and conflicting versions of the same data appear in different places. Data inconsistency creates unreliable information, because it will be difficult to determine which version of the information is correct. (It's difficult to make correct - and timely - decisions if those decisions are based on conflicting information.) Data inconsistency is likely to occur when there is data redundancy. Data redundancy occurs when the data file/database file contains redundant - unnecessarily duplicated - data. That's why one major goal of good database design is to eliminate data redundancy.
  • 3. Contd…… • Data validation is intended to provide certain well-defined guarantees for fitness, accuracy, and consistency for any of various kinds of user input into an application or automated system. Data validation rules can be defined and designed using any of various methodologies, and be deployed in any of various contexts. • Data validation rules may be defined, designed and deployed in different ways according to requirement.
  • 4. Contd…… Figure 1: Simple schematic for a data warehouse. The ETL process extracts information from the source databases, transforms it and then loads it into the data warehouse.
  • 5. Contd…. • In evaluating the basics of data validation, generalizations can be made regarding the different types of validation, according to the scope, complexity, and purpose of the various validation operations to be carried out. • For example: • Data type validation; • Range and constraint validation; • Code and Cross-reference validation- such as one or more tests against regular expressions; and • Structured validation-- LDAP.
  • 6. Data Integration in Data Warehousing Generally speaking, a data integration system combines the data residing at different sources, and provides a unified, reconciled view of these data, called global schema, which can be queried by the user. In the design of a data integration system, an important aspect is the way in which the global schema is specified, i.e., which data model is adopted and what kind of constraints on the data can be expressed. Moreover, a basic decision is related to the problem of how to specify the relation between the sources and the global schema. There are basically two approaches for this problem. The first approach, called global-as view (GAV), requires that the global schema is expressed in terms of the data sources. More precisely, to every concept of the global schema, a view over the data sources is associated, so that its meaning is specified in terms of the data residing at the sources. In the second approach, called local-as-view (LAV), the global schema is specified independently from the sources, and the relationships between the global schema and the sources are established by defining every source as a view over the global schema.
  • 7. Problems in data integration • Inconsistencies • Redundancies
  • 8. METADATA IN THE DATA WAREHOUSE • Metadata is simply defined as data about data. The data that are used to represent other data is known as metadata. For example the index of a book serve as metadata for the contents in the book. In other words we can say that metadata is the summarized data that leads us to the detailed data. • Categories of Metadata • The metadata can be broadly categorized into three categories: • Business Metadata - This metadata has the data ownership information, business definition and changing policies. • Technical Metadata - Technical metadata includes database system names, table and column names and sizes, data types and allowed values. Technical metadata also includes structural information such as primary and foreign key attributes and indices. • Operational Metadata - This metadata includes currency of data and data lineage.Currency of data means whether data is active, archived or purged. Lineage of data means history of data migrated and transformation applied on it.
  • 9. Mapping • A basic part of the data warehouse environment is that of mapping from the operational • environment into the data warehouse. The mapping includes a wide variety of facets, • including, but not limited to: • • mapping from one attribute to another, • • conversions, • • changes in naming conventions, • • changes in physical characteristics of data, • • filtering of data, etc.
  • 10. Retention and Purging? • What is data retention and Purging? • There are certain requirement to purge / archive / delete the data in data warehouse after a certain period of time, often termed as retention period of the data warehouse. Once a the retention period is reached, data from the data warehouse are purged or deleted or archived into separate place usually comprising of low cost storage medium (e.g. tape drive). • Why data purging is required? • In a idealistic scenario, we assume data warehouse to store data for good. However there are some reasons why this might not be a good idea in a real scenario: • There are cost overhead associated with the amount of data that we store. This includes the cost of storage medium, infrastructure and human resources necessary to manage the data • There is direct impact of data volume to the performance of a data warehouse. More data means more time consuming sorting and searching operations • End users of the data warehouse in the business side may not be interested in the very old fact and figures. Data might lose its importance and relevance with the changing business landscape. Retaining such impertinent data may not be required