SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
1
“Gartner Hype Cycle”, Gartner, July 2013
2 “International Journal of Latest Trends in Computing Vol.1, Issue2 p139:The Empirical Study on the Factors Affecting Data Warehousing Success”, Md. Ruhul Amin & Md.
Taslim Arefin, December 2010
3
“The State of Data Quality” Experian 2013
4
“2012 BI and Information Management Trends”, Information Week, November 2011
Data Quality By Design
Managing Data Quality for Data Warehousing and Business Intelligence environments
ź More than 50% of DW/BI projects are failing to meet expectations, with more than 50% of data projects having limited acceptance or be
outright failures as a result of lack of attention to data quality issuest 2
ź Common data errors plague 91% of organisations. 3
ź 46% of businesses cite data quality as a barrier to BI adoption. 4
“Data is the life blood of the business” is a phrase that gets bandied about.Yet all too often, the quality of data is typically not thought about –
at least not up front and until it’s too late. If data really was “life blood of the business,” you’d expect data quality to be fundamental!
One explanation data quality being overlooked is that the I.T. department is often responsible for delivering and operating the DWH/BI
environment.What ensues ends up being an agenda based on “how do we build it”, not a “why are we doing this”.This needs to change.
Fast Facts
Data Quality By Design
Data Quality By Design is an approach that aims to make the quality of data a foundational part of business systems design and
implementation, both for business processes and business applications. Data warehouses (DW), Business Intelligence (BI) and Analytics
initiatives are an excellent entry point to introduce the concept of data quality by design, simply because the implementation of these
solutions is entirely focussed on delivering information to business users for decision-making purposes. (For the purposes of this paper,
“DW/BI” will hereafter be used to refer to and all such solution initiatives).
There are many different approaches and methodologies for DW/BI implementation. However, all DW/BI methods will typically feature four
major stages of delivery:
ź Data Discovery (Source data analysis)
ź Data Modelling (Business functional models, logical models, physical data structures)
ź Data Movement (ETL/ELT/ESB)
ź Develop Key Outputs (Business Intelligence, standard reports & ad hoc analytics)
Additionally, most modern DW/BI methodologies will follow an iterative, incremental (Agile) delivery approach; sequential (“waterfall”)
methods are typically not recommended.
Data Quality techniques can help support each stage in the DW/BI delivery process.The remainder of this paper will explore the relationship
between Data Quality techniques and these key stages of DW/BI solution delivery.
It’s probably reasonable to claim that we have finally reached a stage of pervasiveness with Data Warehouses & Business Intelligence (BI).
Data warehouses are now generally accepted as part of mainstream business systems, to the point where they are almost ubiquitous.
Certainly, most major businesses would have some form of BI solution these days and business users now expect their Business
Intelligence analyses to be available any time, any where on any device.
Additionally, “Big Data” is getting information and analytics onto the business agenda like never before (although according to Gartner’s
recent Hype Cycle analysis, key components of “Big Data” solutions such as cloud-based grid computing, in-memory databases and1
MapReduce are only moving through the “trough of disillusionment” phase.) All in all, we really are now living the dream of the digital
economy in the Information Age.
And yet, time and again, we hear about the failure of data warehouses – while things may be improving, they’re moving only slowly.
Introduction
Data Quality during Data Discovery
Data Discovery process steps:
ź Agreeing scope of suitable source data sets
ź Source data analysis
ź Logical source-to-target mapping
How Data Quality techniques can help:
ź Data Profiling: Identify previously unknown issues with data as part of discovery
ź Data Inspection: Increase Data Stewards’ understanding of the data
- Get more intimate with data
- Discovery of additional context and narrative
- Articulation of new business rules (“why is it that…?”)
ź Corrective Action Planning: feedback to remediate data issues before solution development / testing.
Data Quality Profiling:
Data quality profiling is an excellent diagnostic method for gaining additional understanding of the data.
Profiling the source data helps inform both business requirements definition and detailed solution designs for data-related project, as well as
enabling data issues to be managed ahead of project implementation.
Profiling may be required at several levels:
ź Simple profiling with a single table (e.g. Primary Key constraint violations)
ź Medium complexity profiling across two or more interdependent tables (e.g. Foreign Key violations)
ź Complex profiling across two or more data sets, with applied business logic (e.g. reconciliation checks)
ź Field by field analysis is required to truly understand the data gaps.
Any data profiling analysis must not only identify the issues and underlying root causes, but must also identify the business impact of the
data quality problem (effectiveness, efficiency, risk inhibitors).
This will help identify any the value of remediating the data. Root cause analysis also helps identify any process outliers and drives out
requirements for remedial action on managing any identified exceptions.
Be sure to profile your data and take baseline measures before applying any remedial actions – this will enable you to measure the impact
of any changes.
“The data is always right” – a data quality error indicates a failure in the process, the system or the people. Use the data to inform and
drive process change.
RECOMMENDED ACTION POINTS:
ź Data Quality Profiling and root-cause analysis to be undertaken as an initiation activity as part of all data warehouse, master data and
application migration project phases.
DQ as part of data modelling
Data modelling steps:
ź Define Business Functions & requirements
ź Identify Logical Data Model subject areas
ź Derive target physical data models
How DQ techniques can help:
ź Data gap analysis: identify requirements for new data that is useful to a business function, but not currently captured.
ź Identify unnecessary data: screen for data sets that currently get captured but that actually have no utility to the business.
ź Develop Data Quality Rules: articulate explicit rules for how the data should be represented, and build screening capability (the “data
quality firewall”) to ensure that business data is fit for purpose.
ź Manage fragmentation and duplication of data: identify more explicitly where data needs to be integrated and replicated, and apply
more rigorous controls to ensure that the “golden record” for a given data entity is maintained in the designated system of record only.
The Enterprise Information Model is a crucial tool for driving consistency, integrity and utility of information across the business. A number
of discreet steps are identified to derive the overall Enterprise Information Model for the enterprise:
ź Establish a core top-down Business Functional Model of the business (a.k.a. Conceptual Business Model).This describes the
idealised view of what should be happening in the enterprise from a functional perspective. It establishes the business context(s) that
operate upon the data and within which data is then managed. Structured interviews with business leadership team are a good entry
point for capturing the core structure and expectations of the idealised business functional model. (N.B. this is not the process model,
nor a model of the organisational structure. Departments are not functions!).
ź Derive the core Business Glossary of common informational terms (high-level consistent business lexicon that supports shared
interpretation and clear semantic meaning of the data).The Business Glossary will be underpinned by a much more detailed and
expansive set of Technical Metadata which captures the explicit and context–specific metadata definitions, derivation, business rules,
calculation logic and lineage for each atomic information term in use within the enterprise.
ź Derive the logical data model for the enterprise, which represents the core entities, attributes and relationships for informational items
required to support the identified business functions.
ź Model the Reporting Catalogue of key business questions (operational queries, historic reports, analytical and predictive models,
data mining models) that the business should be asking in order to monitor and drive its performance.This will almost certainly include
questions that are currently not being asked by the business community, and may include questions that currently cannot be answered
using existing data. Note too that many of these questions will be cross-functional in nature. Creative thinking is required, including
learning from what other industries are doing.
ź Derive the CRUD Matrix (Create, Read, Update, Delete) for business data by mapping the Functional Model to the entities in the
Logical Data Model. Every business function must act upon at least one entity, and every entity must be acted upon by at least one
function.
ź Map to the Information Asset Register, which documents the catalogue of current data holdings: what data currently exists within the
enterprise, where it exists, for what purpose(s) it is currently used, and by whom.
RECOMMENDED ACTION POINTS:
ź Continue to develop the Enterprise Information Model elements:
- Business Functional Model, Business Glossary, Technical Metadata, Logical Data Model, Reporting Catalogue, CRUD Matrix,
Information Asset Register.
ź Adopt, apply and enhance these models and tools within data-related projects as an explicit part of the Systems Development Lifecycle
(SDLC).
DQ as a part of Data Movement (ETL/ELT, data migration, integration etc.)
Data Validation and Dimensions of Data Quality
In any data quality validation process, target tolerances & business rules need to be established.
A pragmatic approach to data quality measurement takes into account fitness-for-purpose(s). How this is measured will be up to the
individual business, however a suggested schema for profiling data is based across the following “ACE” dimensions:
ź Availability
- Currency: the reference period of the data; is the data still up-to-date and relevant, or is it “stale”? (e.g. the only available
Customer List was last updated in 2007).
- Timeliness: is the data made available when it is needed? (e.g. I currently don’t receive the Sales Order figures until three days
after Month-End.)
ź Completeness
- Individual Record Completeness: Are all required field provided within each data record, or are there inappropriate NULL
values? (e.g. in the sales transaction Ref# 340254 for Mr. Smith on 16/03/14, there is no Sales Quantity recorded).
- Data Set Completeness: Are all expected records present, or are there gaps in the history? (e.g. In my list of Sales transactions
for 2014, there are no rows of data for July.)
ź Error Free
- Integrity/Coherence: Do different data sets join up as intended? (e.g. Do the “Orders” and “Dispatches” data sets both include
the Customer Reference Number, and are the CRNs the same?)
- Uniqueness: Are discreet values expected and preserved? (e.g. do all customers have a unique Customer Reference Number,
or do we have two customers with the same CRN?)
- Validity: Is the data within an acceptable range of known parameters. (e.g. I have a dispatch note for 31st November 2013).
Data Movement Process Steps:
ź Detailed source-to-target mapping
ź Audit and integrity checks
ź Integration code & test
ź Reconciliation checks
How DQ techniques can help:
ź Data validation: pre-load checks as precursor to incorporating data into the data
warehouse
ź Data Quality Firewall profile and alert: feedback loop to source (alert, trouble ticket
generation).
Data Quality Firewall
Perceptions of data quality are affected by our past experiences. If we can proactively influence the quality of data before a new solution is
delivered, then there is a greater chance of project success.
A “Data Quality Firewall” establishes a visible window on data quality for both business and technical operations.With respect to project
delivery, new solutions should apply the above model to identify and incorporate a number of data quality firewall elements into their design
and delivery:
ź Up front visibility of known data quality issues, based on proactive profiling of the data (see above).
ź Strong metadata management at the Business Glossary level to ensure consistent business understanding of data, and good
technical metadata management of data to align with the underlying detailed business rules.
ź Using profiling to inform and develop the library of Data Quality Rules that apply to data.
ź Ongoing profiling of data prior to Data Warehouse loading, with rejecting any data records that fail profiling checks.
ź Feedback loop from the DQ firewall to inform and influence changes to both operational systems and data warehouse designs
(“improving the data improves the design, which improves the data”).
ź Parameter-based targets and tolerances for data quality, with automated alerts and reporting summaries on identified issues that fail
DQ profiling checks.
ź Automated generation of trouble tickets on any identified issues.
ź Preferred approach of “load trusted”: validate data before accepting it into the DW/BI environment (Compare with “Load everything”
environments that flag erroneous records for after-the-fact correction (not ideal)
Note too that the just because we have the ability to profile a feature doesn’t mean we should! 100% data quality is almost never necessary
(at least for analytic decision-making). Pragmatism should be applied to prioritise profiling and remedial efforts.
RECOMMENDED ACTION POINTS:
ź Incorporate “Data Quality Firewall” considerations as a necessary requirement for all new projects
ź Coach Data Quality by Design into Business Analyst and Solution Architecture teams.
For measuring a particular data set against each dimension, appropriate tolerances need to established to define what is (or is not) deemed
to be acceptable for general usage. A Data Quality Declaration statement (DQD) with each data set it issues provides contextual and
narrative guidance to data consumers as to the relative suitability of the data set within a given context.The consumer can then adjudge
whether or not the data set is suitable for their purpose.
Simple GREEN/RED indicator for each DQ Dimension, together with a summary statement of the source context of the data, including
description of the provenance, relevance and authority of the data source.
RECOMMENDED ACTION POINTS:
ź Consider defining a Data Quality Declaration for each data.
ź Interpretability: any explanatory narrative.
ź Accessibility: How the data set can be accessed, and it’s format.
Data Quality as part of Business Intelligence
BI Process Steps:
ź Define semantic layer (business
glossary terms)
ź Prepare key outputs (reports, analysis,
ad hoc capability)
How DQ techniques can help:
ź Publish a Data Quality Declaration to indicate level of trust in the data consumed by
a report
ź Metadata management, lineage & communication of shared understanding
Conclusions
Data Quality techniques enhance the delivery and operation at each stage of delivering the DW/BI environment.This drives better solutions
design, increased adoption and enhanced business value. Ideally, data quality capabilities will be incorporated during initial stages of DW/BI
solution delivery. However, capability can also be retro-fitted to drive better trust of the DW/BI output.
Ongoing profiling, validation and correction of data ensures the contents of the DW/BI solution remain trusted.
About the author
Alan D. Duncan is an evangelist for information and analytics as enablers of better business outcomes, and a member of the Advisory
Board for QFire Software.
An executive-level leader in the field of Information and Data Management Strategy, Governance and Business Analytics, he has over
20 years of international business experience, working with blue-chip companies in a range of industry sectors.
Alan was named by Information-Management.com in their 2012 list of “Top 12 Data Governance gurus you should be following on Twitter”.
Twitter: Blog:@Alan_D_Duncan http://informationaction.blogspot.com.au

Mais conteúdo relacionado

Mais procurados

Itlc hanoi ba day 3 - thai son - data modelling
Itlc hanoi   ba day 3 - thai son - data modellingItlc hanoi   ba day 3 - thai son - data modelling
Itlc hanoi ba day 3 - thai son - data modellingVu Hung Nguyen
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
Organizing Master Data Management
Organizing Master Data ManagementOrganizing Master Data Management
Organizing Master Data ManagementBoris Otto
 
A Reference Process Model for Master Data Management
A Reference Process Model for Master Data ManagementA Reference Process Model for Master Data Management
A Reference Process Model for Master Data ManagementBoris Otto
 
Business intelligence vs business analytics
Business intelligence  vs business analyticsBusiness intelligence  vs business analytics
Business intelligence vs business analyticsSuvradeep Rudra
 
Master Data Management methodology
Master Data Management methodologyMaster Data Management methodology
Master Data Management methodologyDatabase Architechs
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6Prithwis Mukerjee
 
Data quality overview
Data quality overviewData quality overview
Data quality overviewAlex Meadows
 
A treatise on SAP CRM information reporting
A treatise on SAP CRM information reportingA treatise on SAP CRM information reporting
A treatise on SAP CRM information reportingVijay Raj
 
Mastering data-modeling-for-master-data-domains
Mastering data-modeling-for-master-data-domainsMastering data-modeling-for-master-data-domains
Mastering data-modeling-for-master-data-domainsChanukya Mekala
 
Sound Data Quality for CRM
Sound Data Quality for CRMSound Data Quality for CRM
Sound Data Quality for CRMDivya Malik
 
Data Quality Rules introduction
Data Quality Rules introductionData Quality Rules introduction
Data Quality Rules introductiondatatovalue
 
Master Data Management: Extracting Value from Your Most Important Intangible ...
Master Data Management: Extracting Value from Your Most Important Intangible ...Master Data Management: Extracting Value from Your Most Important Intangible ...
Master Data Management: Extracting Value from Your Most Important Intangible ...FindWhitePapers
 
Adventures in Data Profiling
Adventures in Data ProfilingAdventures in Data Profiling
Adventures in Data ProfilingJim Harris
 
Data warehousing and machine learning primer
Data warehousing and machine learning primerData warehousing and machine learning primer
Data warehousing and machine learning primerTom Donoghue
 

Mais procurados (20)

Itlc hanoi ba day 3 - thai son - data modelling
Itlc hanoi   ba day 3 - thai son - data modellingItlc hanoi   ba day 3 - thai son - data modelling
Itlc hanoi ba day 3 - thai son - data modelling
 
Data Quality Definitions
Data Quality DefinitionsData Quality Definitions
Data Quality Definitions
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
 
Organizing Master Data Management
Organizing Master Data ManagementOrganizing Master Data Management
Organizing Master Data Management
 
Classification of data
Classification of dataClassification of data
Classification of data
 
A Reference Process Model for Master Data Management
A Reference Process Model for Master Data ManagementA Reference Process Model for Master Data Management
A Reference Process Model for Master Data Management
 
Business intelligence vs business analytics
Business intelligence  vs business analyticsBusiness intelligence  vs business analytics
Business intelligence vs business analytics
 
Master Data Management methodology
Master Data Management methodologyMaster Data Management methodology
Master Data Management methodology
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
 
A treatise on SAP CRM information reporting
A treatise on SAP CRM information reportingA treatise on SAP CRM information reporting
A treatise on SAP CRM information reporting
 
Infographic: Data Governance Best Practices
Infographic: Data Governance Best Practices Infographic: Data Governance Best Practices
Infographic: Data Governance Best Practices
 
Focus
FocusFocus
Focus
 
Mastering data-modeling-for-master-data-domains
Mastering data-modeling-for-master-data-domainsMastering data-modeling-for-master-data-domains
Mastering data-modeling-for-master-data-domains
 
Sound Data Quality for CRM
Sound Data Quality for CRMSound Data Quality for CRM
Sound Data Quality for CRM
 
Data Quality Rules introduction
Data Quality Rules introductionData Quality Rules introduction
Data Quality Rules introduction
 
Master Data Management: Extracting Value from Your Most Important Intangible ...
Master Data Management: Extracting Value from Your Most Important Intangible ...Master Data Management: Extracting Value from Your Most Important Intangible ...
Master Data Management: Extracting Value from Your Most Important Intangible ...
 
Adventures in Data Profiling
Adventures in Data ProfilingAdventures in Data Profiling
Adventures in Data Profiling
 
Data warehousing and machine learning primer
Data warehousing and machine learning primerData warehousing and machine learning primer
Data warehousing and machine learning primer
 

Destaque

Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-WarehouseAbdul Aslam
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profilingShailja Khurana
 
How to Profit from Factoring 2015
How to Profit from Factoring 2015How to Profit from Factoring 2015
How to Profit from Factoring 2015Michael Ponomarew
 
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry PaulFish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paulvandananicky
 
Rate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applicationsRate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applicationsPaul singh
 
What is system level analysis
What is system level analysisWhat is system level analysis
What is system level analysisCAST
 
Top 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answersTop 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answersjanritari
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesDataWorks Summit
 
Financial aspects of marketing management
Financial aspects of marketing managementFinancial aspects of marketing management
Financial aspects of marketing managementBabasab Patil
 
Moving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life StoryMoving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life StorySauce Labs
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsCloudera, Inc.
 
IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)Nurhazman Abdul Aziz
 
The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work Cav1234
 

Destaque (17)

Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-Warehouse
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
How to Profit from Factoring 2015
How to Profit from Factoring 2015How to Profit from Factoring 2015
How to Profit from Factoring 2015
 
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry PaulFish Sticks by Stephen C Lundin, John Christensen and Harry Paul
Fish Sticks by Stephen C Lundin, John Christensen and Harry Paul
 
Rate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applicationsRate zonal centrifugation and Its applications
Rate zonal centrifugation and Its applications
 
What is system level analysis
What is system level analysisWhat is system level analysis
What is system level analysis
 
Top 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answersTop 10 team coordinator interview questions and answers
Top 10 team coordinator interview questions and answers
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual Machines
 
Financial aspects of marketing management
Financial aspects of marketing managementFinancial aspects of marketing management
Financial aspects of marketing management
 
Moving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life StoryMoving From a Selenium Grid to the Cloud - A Real Life Story
Moving From a Selenium Grid to the Cloud - A Real Life Story
 
Progeny LIMS
Progeny LIMSProgeny LIMS
Progeny LIMS
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
 
Getting Past No
Getting Past NoGetting Past No
Getting Past No
 
IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)IT Strategic Planning (Case Studies)
IT Strategic Planning (Case Studies)
 
Matrix Effect
Matrix EffectMatrix Effect
Matrix Effect
 
The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work The purpose and Benefits of setting high standards for your work
The purpose and Benefits of setting high standards for your work
 

Semelhante a Data Quality in Data Warehouse and Business Intelligence Environments - Discussion Paper

Business Intelligence
Business IntelligenceBusiness Intelligence
Business IntelligenceSukirti Garg
 
Business Intelligence Module 2
Business Intelligence Module 2Business Intelligence Module 2
Business Intelligence Module 2Home
 
Building The Agile Database
Building The Agile DatabaseBuilding The Agile Database
Building The Agile Databaseelliando dias
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata StrategiesDATAVERSITY
 
DAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management ToolDAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management ToolPrecisely
 
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Kevin Pledge
 
Analytics for actuaries cia
Analytics for actuaries ciaAnalytics for actuaries cia
Analytics for actuaries ciaKevin Pledge
 
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdfChallenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdfvenkatakeerthi3
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
Fuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernanceFuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernancePedro Martins
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsAbhishek Sood
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Miningtobiemuir
 
Building an effective and extensible data and analytics operating model
Building an effective and extensible data and analytics operating modelBuilding an effective and extensible data and analytics operating model
Building an effective and extensible data and analytics operating modelJayakumar Rajaretnam
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys HolovatyiDataScienceConferenc1
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data TipsQubole
 
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...Data Science Council of America
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxmuflehaljarrah
 

Semelhante a Data Quality in Data Warehouse and Business Intelligence Environments - Discussion Paper (20)

Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Business Intelligence Module 2
Business Intelligence Module 2Business Intelligence Module 2
Business Intelligence Module 2
 
Building The Agile Database
Building The Agile DatabaseBuilding The Agile Database
Building The Agile Database
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata Strategies
 
DAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management ToolDAMA Australia: How to Choose a Data Management Tool
DAMA Australia: How to Choose a Data Management Tool
 
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
 
Analytics for actuaries cia
Analytics for actuaries ciaAnalytics for actuaries cia
Analytics for actuaries cia
 
Why data governance is the new buzz?
Why data governance is the new buzz?Why data governance is the new buzz?
Why data governance is the new buzz?
 
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdfChallenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Fuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernanceFuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data Governance
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data Analytics
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
Building an effective and extensible data and analytics operating model
Building an effective and extensible data and analytics operating modelBuilding an effective and extensible data and analytics operating model
Building an effective and extensible data and analytics operating model
 
What is analytics
What is analyticsWhat is analytics
What is analytics
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
 
Expert Big Data Tips
Expert Big Data TipsExpert Big Data Tips
Expert Big Data Tips
 
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptx
 

Mais de Alan D. Duncan

Igqie14 analytics and ethics 20141107
Igqie14   analytics and ethics 20141107Igqie14   analytics and ethics 20141107
Igqie14 analytics and ethics 20141107Alan D. Duncan
 
07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements TemplateAlan D. Duncan
 
03. Business Information Requirements Template
03. Business Information Requirements Template03. Business Information Requirements Template
03. Business Information Requirements TemplateAlan D. Duncan
 
LL Higher Ed BI 2014 Key BI Market Trends 20140513a
LL Higher Ed BI 2014 Key BI Market Trends 20140513aLL Higher Ed BI 2014 Key BI Market Trends 20140513a
LL Higher Ed BI 2014 Key BI Market Trends 20140513aAlan D. Duncan
 
Managing for Effective Data Governance: workshop for DQ Asia Pacific Congress...
Managing for Effective Data Governance: workshop for DQ Asia Pacific Congress...Managing for Effective Data Governance: workshop for DQ Asia Pacific Congress...
Managing for Effective Data Governance: workshop for DQ Asia Pacific Congress...Alan D. Duncan
 
The ABC of Data Governance: driving Information Excellence
The ABC of Data Governance: driving Information ExcellenceThe ABC of Data Governance: driving Information Excellence
The ABC of Data Governance: driving Information ExcellenceAlan D. Duncan
 

Mais de Alan D. Duncan (6)

Igqie14 analytics and ethics 20141107
Igqie14   analytics and ethics 20141107Igqie14   analytics and ethics 20141107
Igqie14 analytics and ethics 20141107
 
07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template
 
03. Business Information Requirements Template
03. Business Information Requirements Template03. Business Information Requirements Template
03. Business Information Requirements Template
 
LL Higher Ed BI 2014 Key BI Market Trends 20140513a
LL Higher Ed BI 2014 Key BI Market Trends 20140513aLL Higher Ed BI 2014 Key BI Market Trends 20140513a
LL Higher Ed BI 2014 Key BI Market Trends 20140513a
 
Managing for Effective Data Governance: workshop for DQ Asia Pacific Congress...
Managing for Effective Data Governance: workshop for DQ Asia Pacific Congress...Managing for Effective Data Governance: workshop for DQ Asia Pacific Congress...
Managing for Effective Data Governance: workshop for DQ Asia Pacific Congress...
 
The ABC of Data Governance: driving Information Excellence
The ABC of Data Governance: driving Information ExcellenceThe ABC of Data Governance: driving Information Excellence
The ABC of Data Governance: driving Information Excellence
 

Último

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 

Último (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 

Data Quality in Data Warehouse and Business Intelligence Environments - Discussion Paper

  • 1. 1 “Gartner Hype Cycle”, Gartner, July 2013 2 “International Journal of Latest Trends in Computing Vol.1, Issue2 p139:The Empirical Study on the Factors Affecting Data Warehousing Success”, Md. Ruhul Amin & Md. Taslim Arefin, December 2010 3 “The State of Data Quality” Experian 2013 4 “2012 BI and Information Management Trends”, Information Week, November 2011 Data Quality By Design Managing Data Quality for Data Warehousing and Business Intelligence environments ź More than 50% of DW/BI projects are failing to meet expectations, with more than 50% of data projects having limited acceptance or be outright failures as a result of lack of attention to data quality issuest 2 ź Common data errors plague 91% of organisations. 3 ź 46% of businesses cite data quality as a barrier to BI adoption. 4 “Data is the life blood of the business” is a phrase that gets bandied about.Yet all too often, the quality of data is typically not thought about – at least not up front and until it’s too late. If data really was “life blood of the business,” you’d expect data quality to be fundamental! One explanation data quality being overlooked is that the I.T. department is often responsible for delivering and operating the DWH/BI environment.What ensues ends up being an agenda based on “how do we build it”, not a “why are we doing this”.This needs to change. Fast Facts Data Quality By Design Data Quality By Design is an approach that aims to make the quality of data a foundational part of business systems design and implementation, both for business processes and business applications. Data warehouses (DW), Business Intelligence (BI) and Analytics initiatives are an excellent entry point to introduce the concept of data quality by design, simply because the implementation of these solutions is entirely focussed on delivering information to business users for decision-making purposes. (For the purposes of this paper, “DW/BI” will hereafter be used to refer to and all such solution initiatives). There are many different approaches and methodologies for DW/BI implementation. However, all DW/BI methods will typically feature four major stages of delivery: ź Data Discovery (Source data analysis) ź Data Modelling (Business functional models, logical models, physical data structures) ź Data Movement (ETL/ELT/ESB) ź Develop Key Outputs (Business Intelligence, standard reports & ad hoc analytics) Additionally, most modern DW/BI methodologies will follow an iterative, incremental (Agile) delivery approach; sequential (“waterfall”) methods are typically not recommended. Data Quality techniques can help support each stage in the DW/BI delivery process.The remainder of this paper will explore the relationship between Data Quality techniques and these key stages of DW/BI solution delivery. It’s probably reasonable to claim that we have finally reached a stage of pervasiveness with Data Warehouses & Business Intelligence (BI). Data warehouses are now generally accepted as part of mainstream business systems, to the point where they are almost ubiquitous. Certainly, most major businesses would have some form of BI solution these days and business users now expect their Business Intelligence analyses to be available any time, any where on any device. Additionally, “Big Data” is getting information and analytics onto the business agenda like never before (although according to Gartner’s recent Hype Cycle analysis, key components of “Big Data” solutions such as cloud-based grid computing, in-memory databases and1 MapReduce are only moving through the “trough of disillusionment” phase.) All in all, we really are now living the dream of the digital economy in the Information Age. And yet, time and again, we hear about the failure of data warehouses – while things may be improving, they’re moving only slowly. Introduction
  • 2. Data Quality during Data Discovery Data Discovery process steps: ź Agreeing scope of suitable source data sets ź Source data analysis ź Logical source-to-target mapping How Data Quality techniques can help: ź Data Profiling: Identify previously unknown issues with data as part of discovery ź Data Inspection: Increase Data Stewards’ understanding of the data - Get more intimate with data - Discovery of additional context and narrative - Articulation of new business rules (“why is it that…?”) ź Corrective Action Planning: feedback to remediate data issues before solution development / testing. Data Quality Profiling: Data quality profiling is an excellent diagnostic method for gaining additional understanding of the data. Profiling the source data helps inform both business requirements definition and detailed solution designs for data-related project, as well as enabling data issues to be managed ahead of project implementation. Profiling may be required at several levels: ź Simple profiling with a single table (e.g. Primary Key constraint violations) ź Medium complexity profiling across two or more interdependent tables (e.g. Foreign Key violations) ź Complex profiling across two or more data sets, with applied business logic (e.g. reconciliation checks) ź Field by field analysis is required to truly understand the data gaps. Any data profiling analysis must not only identify the issues and underlying root causes, but must also identify the business impact of the data quality problem (effectiveness, efficiency, risk inhibitors). This will help identify any the value of remediating the data. Root cause analysis also helps identify any process outliers and drives out requirements for remedial action on managing any identified exceptions. Be sure to profile your data and take baseline measures before applying any remedial actions – this will enable you to measure the impact of any changes. “The data is always right” – a data quality error indicates a failure in the process, the system or the people. Use the data to inform and drive process change. RECOMMENDED ACTION POINTS: ź Data Quality Profiling and root-cause analysis to be undertaken as an initiation activity as part of all data warehouse, master data and application migration project phases. DQ as part of data modelling Data modelling steps: ź Define Business Functions & requirements ź Identify Logical Data Model subject areas ź Derive target physical data models How DQ techniques can help: ź Data gap analysis: identify requirements for new data that is useful to a business function, but not currently captured. ź Identify unnecessary data: screen for data sets that currently get captured but that actually have no utility to the business. ź Develop Data Quality Rules: articulate explicit rules for how the data should be represented, and build screening capability (the “data quality firewall”) to ensure that business data is fit for purpose. ź Manage fragmentation and duplication of data: identify more explicitly where data needs to be integrated and replicated, and apply more rigorous controls to ensure that the “golden record” for a given data entity is maintained in the designated system of record only.
  • 3. The Enterprise Information Model is a crucial tool for driving consistency, integrity and utility of information across the business. A number of discreet steps are identified to derive the overall Enterprise Information Model for the enterprise: ź Establish a core top-down Business Functional Model of the business (a.k.a. Conceptual Business Model).This describes the idealised view of what should be happening in the enterprise from a functional perspective. It establishes the business context(s) that operate upon the data and within which data is then managed. Structured interviews with business leadership team are a good entry point for capturing the core structure and expectations of the idealised business functional model. (N.B. this is not the process model, nor a model of the organisational structure. Departments are not functions!). ź Derive the core Business Glossary of common informational terms (high-level consistent business lexicon that supports shared interpretation and clear semantic meaning of the data).The Business Glossary will be underpinned by a much more detailed and expansive set of Technical Metadata which captures the explicit and context–specific metadata definitions, derivation, business rules, calculation logic and lineage for each atomic information term in use within the enterprise. ź Derive the logical data model for the enterprise, which represents the core entities, attributes and relationships for informational items required to support the identified business functions. ź Model the Reporting Catalogue of key business questions (operational queries, historic reports, analytical and predictive models, data mining models) that the business should be asking in order to monitor and drive its performance.This will almost certainly include questions that are currently not being asked by the business community, and may include questions that currently cannot be answered using existing data. Note too that many of these questions will be cross-functional in nature. Creative thinking is required, including learning from what other industries are doing. ź Derive the CRUD Matrix (Create, Read, Update, Delete) for business data by mapping the Functional Model to the entities in the Logical Data Model. Every business function must act upon at least one entity, and every entity must be acted upon by at least one function. ź Map to the Information Asset Register, which documents the catalogue of current data holdings: what data currently exists within the enterprise, where it exists, for what purpose(s) it is currently used, and by whom. RECOMMENDED ACTION POINTS: ź Continue to develop the Enterprise Information Model elements: - Business Functional Model, Business Glossary, Technical Metadata, Logical Data Model, Reporting Catalogue, CRUD Matrix, Information Asset Register. ź Adopt, apply and enhance these models and tools within data-related projects as an explicit part of the Systems Development Lifecycle (SDLC). DQ as a part of Data Movement (ETL/ELT, data migration, integration etc.) Data Validation and Dimensions of Data Quality In any data quality validation process, target tolerances & business rules need to be established. A pragmatic approach to data quality measurement takes into account fitness-for-purpose(s). How this is measured will be up to the individual business, however a suggested schema for profiling data is based across the following “ACE” dimensions: ź Availability - Currency: the reference period of the data; is the data still up-to-date and relevant, or is it “stale”? (e.g. the only available Customer List was last updated in 2007). - Timeliness: is the data made available when it is needed? (e.g. I currently don’t receive the Sales Order figures until three days after Month-End.) ź Completeness - Individual Record Completeness: Are all required field provided within each data record, or are there inappropriate NULL values? (e.g. in the sales transaction Ref# 340254 for Mr. Smith on 16/03/14, there is no Sales Quantity recorded). - Data Set Completeness: Are all expected records present, or are there gaps in the history? (e.g. In my list of Sales transactions for 2014, there are no rows of data for July.) ź Error Free - Integrity/Coherence: Do different data sets join up as intended? (e.g. Do the “Orders” and “Dispatches” data sets both include the Customer Reference Number, and are the CRNs the same?) - Uniqueness: Are discreet values expected and preserved? (e.g. do all customers have a unique Customer Reference Number, or do we have two customers with the same CRN?) - Validity: Is the data within an acceptable range of known parameters. (e.g. I have a dispatch note for 31st November 2013). Data Movement Process Steps: ź Detailed source-to-target mapping ź Audit and integrity checks ź Integration code & test ź Reconciliation checks How DQ techniques can help: ź Data validation: pre-load checks as precursor to incorporating data into the data warehouse ź Data Quality Firewall profile and alert: feedback loop to source (alert, trouble ticket generation).
  • 4. Data Quality Firewall Perceptions of data quality are affected by our past experiences. If we can proactively influence the quality of data before a new solution is delivered, then there is a greater chance of project success. A “Data Quality Firewall” establishes a visible window on data quality for both business and technical operations.With respect to project delivery, new solutions should apply the above model to identify and incorporate a number of data quality firewall elements into their design and delivery: ź Up front visibility of known data quality issues, based on proactive profiling of the data (see above). ź Strong metadata management at the Business Glossary level to ensure consistent business understanding of data, and good technical metadata management of data to align with the underlying detailed business rules. ź Using profiling to inform and develop the library of Data Quality Rules that apply to data. ź Ongoing profiling of data prior to Data Warehouse loading, with rejecting any data records that fail profiling checks. ź Feedback loop from the DQ firewall to inform and influence changes to both operational systems and data warehouse designs (“improving the data improves the design, which improves the data”). ź Parameter-based targets and tolerances for data quality, with automated alerts and reporting summaries on identified issues that fail DQ profiling checks. ź Automated generation of trouble tickets on any identified issues. ź Preferred approach of “load trusted”: validate data before accepting it into the DW/BI environment (Compare with “Load everything” environments that flag erroneous records for after-the-fact correction (not ideal) Note too that the just because we have the ability to profile a feature doesn’t mean we should! 100% data quality is almost never necessary (at least for analytic decision-making). Pragmatism should be applied to prioritise profiling and remedial efforts. RECOMMENDED ACTION POINTS: ź Incorporate “Data Quality Firewall” considerations as a necessary requirement for all new projects ź Coach Data Quality by Design into Business Analyst and Solution Architecture teams. For measuring a particular data set against each dimension, appropriate tolerances need to established to define what is (or is not) deemed to be acceptable for general usage. A Data Quality Declaration statement (DQD) with each data set it issues provides contextual and narrative guidance to data consumers as to the relative suitability of the data set within a given context.The consumer can then adjudge whether or not the data set is suitable for their purpose. Simple GREEN/RED indicator for each DQ Dimension, together with a summary statement of the source context of the data, including description of the provenance, relevance and authority of the data source. RECOMMENDED ACTION POINTS: ź Consider defining a Data Quality Declaration for each data. ź Interpretability: any explanatory narrative. ź Accessibility: How the data set can be accessed, and it’s format. Data Quality as part of Business Intelligence BI Process Steps: ź Define semantic layer (business glossary terms) ź Prepare key outputs (reports, analysis, ad hoc capability) How DQ techniques can help: ź Publish a Data Quality Declaration to indicate level of trust in the data consumed by a report ź Metadata management, lineage & communication of shared understanding Conclusions Data Quality techniques enhance the delivery and operation at each stage of delivering the DW/BI environment.This drives better solutions design, increased adoption and enhanced business value. Ideally, data quality capabilities will be incorporated during initial stages of DW/BI solution delivery. However, capability can also be retro-fitted to drive better trust of the DW/BI output. Ongoing profiling, validation and correction of data ensures the contents of the DW/BI solution remain trusted. About the author Alan D. Duncan is an evangelist for information and analytics as enablers of better business outcomes, and a member of the Advisory Board for QFire Software. An executive-level leader in the field of Information and Data Management Strategy, Governance and Business Analytics, he has over 20 years of international business experience, working with blue-chip companies in a range of industry sectors. Alan was named by Information-Management.com in their 2012 list of “Top 12 Data Governance gurus you should be following on Twitter”. Twitter: Blog:@Alan_D_Duncan http://informationaction.blogspot.com.au