SlideShare uma empresa Scribd logo
1 de 22
Ihr Logo
Data Explorer
- A data profiling tool
Your Logo
Agenda
 Introduction
 Existing System
 Limitations of Existing System
 Proposed Solution
 Project Scope
 Block Diagram
 Implementation
 Technology
 Hardware and Software Requirements
 Features and Benefits
 Future Enhancement
Data Explorer – A Data Profiling Tool
Your Logo
Introduction (1/2)
 Data profiling is the process of examining the data available in an existing data source
(e.g. a database or a file) and collecting statistics and information about that data.
 Data profiling is an analysis of the candidate data sources for a data warehouse to
clarify the structure, content, relationships and derivation rules of the data. Profiling
helps to understand anomalies and to assess data quality, but also to discover,
register, and assess enterprise metadata.
 The purpose of data profiling is both to validate metadata when it is available and to
discover metadata when it is not.
 The result of the analysis is used both strategically, to determine suitability of the
candidate source systems and give the basis for an early go/no-go decision, and
tactically, to identify problems for later solution design, and to level sponsors’
expectations.
Data Profiling
Data Explorer – A Data Profiling Tool
Your Logo
Introduction (2/2)
 Find out whether existing data can easily be used for other purposes
 Improve the ability to search the data by tagging it with keywords, descriptions, or
assigning it to a category
 Give metrics on data quality, including whether the data conforms to particular
standards or patterns
 Assess the risk involved in integrating data for new applications, including the
challenges of joins
 Assess whether metadata accurately describes the actual values in the source
database
 Understanding data challenges early in any data intensive project, so that late project
surprises are avoided. Finding data problems late in the project can lead to delays and
cost overruns.
Pourpose of Data Profiling
Data Explorer – A Data Profiling Tool
Your Logo
Existing System
 Initially the data Profiling activities used to be done by writing complicated SQL queries
 This would be comfortable for analyst or user who knows to write SQL queries
 Many of us do not know the proper syntax and format for writing SQL queries
 To overcome this, Data Profiling tools were introduced
 Data Profiling Tools, to a some extent overcome the limitations for writing complex
queries
 All types of profiling activities were not supported by the tools
 User has to understand and learn how to use the tool
Data Explorer – A Data Profiling Tool
Your Logo
Limitations of Existing System
 Development time is more.
 Need to understand the functionality
for developing the queries.
 Results needs to be exported to excel
or notepad for anlysis
 Traditional Approach
 Complex User Interface
 Limited Functionality.
 Setup and Installation.
 License Cost.
 Minimum Server Requirements
SQL Queries Existing Tools
Data Explorer – A Data Profiling Tool
Your Logo
Proposed Solution
 Developing an Application performing all the types of profiling
 Easy to use interface
 Minimum system requirements
 Feature to export the profiling results data to excel
 Additional feature to indicate the Data Quality i.e. Data Quality Indicator
 Supporting multiple Databases like Oracle 10g, Oracle 11g, MS SQL Server 2005, MS
SQL Server 2008, My SQL etc
 Integrating Data Quality to correct erroneous, inconsistent and inaccurate data
Data Explorer – A Data Profiling Tool
Your Logo
Project Scope
 Keeping the Time Line and other factors in mind, the project will currently support only
MS SQL Server
 Also the project will have following types of Profiling:
 Column Profiling
 Empty Column Analysis
 Null Rule Analysis
 Constant Analysis
 Frequency Analysis
 Uniqueness Analysis
 Primary/Composite Key Analysis
 Integrating Data Quality
Data Explorer – A Data Profiling Tool
Your Logo
Architecture Diagram
Data Explorer – A Data Profiling Tool
Analysis Team ManagementBusiness Users
Data Explorer
Data
Profiling
Central
Metadata
Repository
Capture
Issues
and Notes
MS SQL Server Other Databases
Reporting
Your Logo
Implementation
 The project will be implemented module wise.
 Project will be having different modules. Each module will be developed individually
and Unit Tested
 After completion of all the modules and unit testing, all the modules will be integrated
and System Integration Testing will be performed
 There will be separate modules for Databases retrieval from server, Tables retrieval
after selecting a database, Columns retrieval after selecting a Table
 There will be separate module for each type of profiling discussed.
Data Explorer – A Data Profiling Tool
Your Logo
Implementation - Profiling Details
 Column Profiling
 This will help in discovering total no of records, null percentage, unique
percentage, minimum and maximum value in the column, documented data type
etc.
 Constant Analysis
 This will help in discovering those columns which has less than 4 and greater than
0 distinct values.
 Null Rule Analysis
 This will help in finding all the columns in a table which has 100% NULL values
Data Explorer – A Data Profiling Tool
Your Logo
Implementation - Profiling Details
 Unique Analysis
 This will help in finding all the columns in table which has 100% uniqueness.
 Primary Key / Composite Key Analysis
 It will help us to find out the possible primary or composite key columns which can
be have unique combination.
 Frequency Analysis
 This will help in finding the no. of distinct values in the columns and the no. of time
the value is repeated in a column.
Data Explorer – A Data Profiling Tool
Your Logo
Implementation – Data Quality
 Data Unification
 Before Data Unification Profiling results
Data Explorer – A Data Profiling Tool
Column Column Value Count
Gender Male 50
M 10
His 5
male 60
Total 125
Column Column Value Count
Country USA 10
U.S.A 60
United States of America 2
US 20
Total 92
Your Logo
Implementation – Data Quality
 Data Unification
 After Data Unification Profiling results
Data Explorer – A Data Profiling Tool
Column Column Value Count
Gender Male 125
Column Column Value Count
Country U.S.A 92
Your Logo
Implementation – Data Quality
 NULL Removal
 Before Null Removal profiling results
Data Explorer – A Data Profiling Tool
Column Null %
Country 30
Column Column Value Count
Country India 50
U.S.A 20
NULL 30
Total 100
Your Logo
Implementation – Data Quality
 NULL Removal
 After Null Removal Profiling results
Data Explorer – A Data Profiling Tool
Column Null %
Country 0
Column Column Value Count
Country India 50
U.S.A 20
N.A. 30
Total 100
NULL value defaulted to N.A. (Not Available)
Your Logo
Technology
 Data Explorer will be developed on .NET platform using C# as a coding language.
 .NET is Microsoft platform for developing advanced and Robust applications
 .NET supports a wide range of library classes which eases the development efforts
and hence more time can be utilized in other activities
 .NET is called Language Independent Platform as it support 4 native languages and 21
non-native languages.
 Native Languages are a Microsoft created languages i.e. C#. VB.Net. J#, VC++
 Non-Native or Non Microsoft Languages supported are Pearl, Ruby on Rails etc
Data Explorer – A Data Profiling Tool
Your Logo
Hardware and Software Requirements
Data Explorer – A Data Profiling Tool
Data Explorer
• Pentium Core 2 Duo
processor or above
• 2 GB RAM
• 20 GB HDD
• Printer
• Router for Internet
Connection
• Windows 2000/
Windows XP/
Windows Vista/
Windows 7
• Microsoft .NET
Framework 3.5
• Microsoft Visual
Studio 2008
Your Logo
Features
 Supports multiple databases like MS SQL Server, Oracle
 Different types of profiling like
 Column Profiling
 Constant Analysis
 Unique Analysis
 Null Rule Analysis
 Frequency Analysis
 Empty Column Analysis
 Primary / Composite Key Analysis
 Quickly Analyze and validate data issues
 Data Quality improvement
Data Explorer – A Data Profiling Tool
Your Logo
Benefits
 Quick discovery of data issues
 No more writing of queries to profile data
 Time efficient
 Shorten the implementation cycle of major projects
 Improve understanding of data for the users
 Discovering business knowledge
 Improves data accuracy in corporate databases
Data Explorer – A Data Profiling Tool
Your Logo
Future Enhancement
 Data Explorer can be further extended to support unstructured or semi-structured data
like flat files, .csv files
 It can also be extended to support other relation data bases like MS Access, MySQL,
Sybase etc Time efficient
 It can also be enhanced by including Data Quality reports on top of Data Quality
Results
 There can be mechanism to store the profiling results so that it can be used or referred
later at any point of time
Data Explorer – A Data Profiling Tool
Ihr Logo
Thank You
Data Explorer – A Data Profiling Tool

Mais conteúdo relacionado

Semelhante a Data explorer

Pradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of ExeriencePradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of Exerience
Pradeep Shahapur
 
SAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptx
SAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptxSAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptx
SAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptx
JakeariesMacarayo
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
RTTS
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
taoyan
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
RTTS
 
Addmi 02-addm overview
Addmi 02-addm overviewAddmi 02-addm overview
Addmi 02-addm overview
odanyboy
 

Semelhante a Data explorer (20)

2010/10 - Database Architechs - Perf. & Tuning Tools
2010/10 - Database Architechs - Perf. & Tuning Tools2010/10 - Database Architechs - Perf. & Tuning Tools
2010/10 - Database Architechs - Perf. & Tuning Tools
 
Intro of Key Features of SoftCAAT Pro software
Intro of Key Features of SoftCAAT Pro softwareIntro of Key Features of SoftCAAT Pro software
Intro of Key Features of SoftCAAT Pro software
 
Intro of Key Features of SoftCAAT BI SQL Software
Intro of Key Features of SoftCAAT BI SQL SoftwareIntro of Key Features of SoftCAAT BI SQL Software
Intro of Key Features of SoftCAAT BI SQL Software
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Pradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of ExeriencePradeep_ETL Testing_CV with 3 years of Exerience
Pradeep_ETL Testing_CV with 3 years of Exerience
 
SAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptx
SAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptxSAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptx
SAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptx
 
Info sphere overview
Info sphere overviewInfo sphere overview
Info sphere overview
 
System analysis and design
System analysis and designSystem analysis and design
System analysis and design
 
12363 database certification
12363 database certification12363 database certification
12363 database certification
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
Software Engineering tools
Software Engineering tools Software Engineering tools
Software Engineering tools
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
2010/09 - Database Architechs - Performance & Tuning Tool
2010/09 - Database Architechs - Performance & Tuning Tool2010/09 - Database Architechs - Performance & Tuning Tool
2010/09 - Database Architechs - Performance & Tuning Tool
 
Database 2 External Schema
Database 2   External SchemaDatabase 2   External Schema
Database 2 External Schema
 
End to-end root cause analysis minimize the time to incident resolution
End to-end root cause analysis minimize the time to incident resolutionEnd to-end root cause analysis minimize the time to incident resolution
End to-end root cause analysis minimize the time to incident resolution
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
 
Addmi 02-addm overview
Addmi 02-addm overviewAddmi 02-addm overview
Addmi 02-addm overview
 

Data explorer

  • 1. Ihr Logo Data Explorer - A data profiling tool
  • 2. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution  Project Scope  Block Diagram  Implementation  Technology  Hardware and Software Requirements  Features and Benefits  Future Enhancement Data Explorer – A Data Profiling Tool
  • 3. Your Logo Introduction (1/2)  Data profiling is the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data.  Data profiling is an analysis of the candidate data sources for a data warehouse to clarify the structure, content, relationships and derivation rules of the data. Profiling helps to understand anomalies and to assess data quality, but also to discover, register, and assess enterprise metadata.  The purpose of data profiling is both to validate metadata when it is available and to discover metadata when it is not.  The result of the analysis is used both strategically, to determine suitability of the candidate source systems and give the basis for an early go/no-go decision, and tactically, to identify problems for later solution design, and to level sponsors’ expectations. Data Profiling Data Explorer – A Data Profiling Tool
  • 4. Your Logo Introduction (2/2)  Find out whether existing data can easily be used for other purposes  Improve the ability to search the data by tagging it with keywords, descriptions, or assigning it to a category  Give metrics on data quality, including whether the data conforms to particular standards or patterns  Assess the risk involved in integrating data for new applications, including the challenges of joins  Assess whether metadata accurately describes the actual values in the source database  Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns. Pourpose of Data Profiling Data Explorer – A Data Profiling Tool
  • 5. Your Logo Existing System  Initially the data Profiling activities used to be done by writing complicated SQL queries  This would be comfortable for analyst or user who knows to write SQL queries  Many of us do not know the proper syntax and format for writing SQL queries  To overcome this, Data Profiling tools were introduced  Data Profiling Tools, to a some extent overcome the limitations for writing complex queries  All types of profiling activities were not supported by the tools  User has to understand and learn how to use the tool Data Explorer – A Data Profiling Tool
  • 6. Your Logo Limitations of Existing System  Development time is more.  Need to understand the functionality for developing the queries.  Results needs to be exported to excel or notepad for anlysis  Traditional Approach  Complex User Interface  Limited Functionality.  Setup and Installation.  License Cost.  Minimum Server Requirements SQL Queries Existing Tools Data Explorer – A Data Profiling Tool
  • 7. Your Logo Proposed Solution  Developing an Application performing all the types of profiling  Easy to use interface  Minimum system requirements  Feature to export the profiling results data to excel  Additional feature to indicate the Data Quality i.e. Data Quality Indicator  Supporting multiple Databases like Oracle 10g, Oracle 11g, MS SQL Server 2005, MS SQL Server 2008, My SQL etc  Integrating Data Quality to correct erroneous, inconsistent and inaccurate data Data Explorer – A Data Profiling Tool
  • 8. Your Logo Project Scope  Keeping the Time Line and other factors in mind, the project will currently support only MS SQL Server  Also the project will have following types of Profiling:  Column Profiling  Empty Column Analysis  Null Rule Analysis  Constant Analysis  Frequency Analysis  Uniqueness Analysis  Primary/Composite Key Analysis  Integrating Data Quality Data Explorer – A Data Profiling Tool
  • 9. Your Logo Architecture Diagram Data Explorer – A Data Profiling Tool Analysis Team ManagementBusiness Users Data Explorer Data Profiling Central Metadata Repository Capture Issues and Notes MS SQL Server Other Databases Reporting
  • 10. Your Logo Implementation  The project will be implemented module wise.  Project will be having different modules. Each module will be developed individually and Unit Tested  After completion of all the modules and unit testing, all the modules will be integrated and System Integration Testing will be performed  There will be separate modules for Databases retrieval from server, Tables retrieval after selecting a database, Columns retrieval after selecting a Table  There will be separate module for each type of profiling discussed. Data Explorer – A Data Profiling Tool
  • 11. Your Logo Implementation - Profiling Details  Column Profiling  This will help in discovering total no of records, null percentage, unique percentage, minimum and maximum value in the column, documented data type etc.  Constant Analysis  This will help in discovering those columns which has less than 4 and greater than 0 distinct values.  Null Rule Analysis  This will help in finding all the columns in a table which has 100% NULL values Data Explorer – A Data Profiling Tool
  • 12. Your Logo Implementation - Profiling Details  Unique Analysis  This will help in finding all the columns in table which has 100% uniqueness.  Primary Key / Composite Key Analysis  It will help us to find out the possible primary or composite key columns which can be have unique combination.  Frequency Analysis  This will help in finding the no. of distinct values in the columns and the no. of time the value is repeated in a column. Data Explorer – A Data Profiling Tool
  • 13. Your Logo Implementation – Data Quality  Data Unification  Before Data Unification Profiling results Data Explorer – A Data Profiling Tool Column Column Value Count Gender Male 50 M 10 His 5 male 60 Total 125 Column Column Value Count Country USA 10 U.S.A 60 United States of America 2 US 20 Total 92
  • 14. Your Logo Implementation – Data Quality  Data Unification  After Data Unification Profiling results Data Explorer – A Data Profiling Tool Column Column Value Count Gender Male 125 Column Column Value Count Country U.S.A 92
  • 15. Your Logo Implementation – Data Quality  NULL Removal  Before Null Removal profiling results Data Explorer – A Data Profiling Tool Column Null % Country 30 Column Column Value Count Country India 50 U.S.A 20 NULL 30 Total 100
  • 16. Your Logo Implementation – Data Quality  NULL Removal  After Null Removal Profiling results Data Explorer – A Data Profiling Tool Column Null % Country 0 Column Column Value Count Country India 50 U.S.A 20 N.A. 30 Total 100 NULL value defaulted to N.A. (Not Available)
  • 17. Your Logo Technology  Data Explorer will be developed on .NET platform using C# as a coding language.  .NET is Microsoft platform for developing advanced and Robust applications  .NET supports a wide range of library classes which eases the development efforts and hence more time can be utilized in other activities  .NET is called Language Independent Platform as it support 4 native languages and 21 non-native languages.  Native Languages are a Microsoft created languages i.e. C#. VB.Net. J#, VC++  Non-Native or Non Microsoft Languages supported are Pearl, Ruby on Rails etc Data Explorer – A Data Profiling Tool
  • 18. Your Logo Hardware and Software Requirements Data Explorer – A Data Profiling Tool Data Explorer • Pentium Core 2 Duo processor or above • 2 GB RAM • 20 GB HDD • Printer • Router for Internet Connection • Windows 2000/ Windows XP/ Windows Vista/ Windows 7 • Microsoft .NET Framework 3.5 • Microsoft Visual Studio 2008
  • 19. Your Logo Features  Supports multiple databases like MS SQL Server, Oracle  Different types of profiling like  Column Profiling  Constant Analysis  Unique Analysis  Null Rule Analysis  Frequency Analysis  Empty Column Analysis  Primary / Composite Key Analysis  Quickly Analyze and validate data issues  Data Quality improvement Data Explorer – A Data Profiling Tool
  • 20. Your Logo Benefits  Quick discovery of data issues  No more writing of queries to profile data  Time efficient  Shorten the implementation cycle of major projects  Improve understanding of data for the users  Discovering business knowledge  Improves data accuracy in corporate databases Data Explorer – A Data Profiling Tool
  • 21. Your Logo Future Enhancement  Data Explorer can be further extended to support unstructured or semi-structured data like flat files, .csv files  It can also be extended to support other relation data bases like MS Access, MySQL, Sybase etc Time efficient  It can also be enhanced by including Data Quality reports on top of Data Quality Results  There can be mechanism to store the profiling results so that it can be used or referred later at any point of time Data Explorer – A Data Profiling Tool
  • 22. Ihr Logo Thank You Data Explorer – A Data Profiling Tool