SlideShare uma empresa Scribd logo
1 de 8
Baixar para ler offline
( Big ) Data Management
Data Quality
Global Concepts in 5 slides
2016
Nicolas SARRAMAGNA
https://fr.linkedin.com/pub/nicolas-sarramagna/19/941/587
CONTENTS
 Introduction
 What / Why
 How
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data Quality in Data Management
MARCH 2015
3
 DATA MANAGEMENT
 Multiples modules
 BIG DATA
 Velocity, Volume, Variety, Veracity, Value
Collect
Storage
Data Mining /
Machine Learning
Data Viz
Governance
Security
Master Data
Data quality
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data quality – What 4
 DATA QUALITY - VERACITY
 Can be used for all data storage (not only in a product of Master Data Management)
 Quality = metrics on your data
 METRICS
 Metrics can depend by data
 EXAMPLES OF METRIC
 Accuracy : exact representation of the data (ex: correct spelling). TABLE impact (ex : nb correct values / nb total)
 Completeness : all fields are filled (ex : no void values). TABLE (ex : nb void values / nb total)
 Conformity : respect specific format (ex : country code - regex). TABLE
 Integrity : no missing links between records, no orphans. APPLI impact
 Duplication : no unnecessary multiple representations of the same data. APPLI
 Timeless : is data sufficiently up-ta-date (ex : actual time - time record saved in DB). APPLI
 Consistency : matching of the data across applications. MULTI-APPLI impact
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data quality – Why / How 5
 DATA QUALITY
 Needed for the Master Data module and build the ‘single version of truth’
 Need to clean, fix data on application
 Need to find the poor quality on the data processing
 Poor quality = issues in applications, on the process, to grow, to build
 Prerequisite to exploit data (garbage in -> garbage out)
 APPLY A GLOBAL TREATMENT ON THE DATA
 Plan Do Check Act approach
 Focus on a perimeter of data
 need process of the data, define metrics/KPI
 Collect
 Store
 Data Quality
 Identify poor quality on the process
 Actions to improve it
 put quality at the creation of the data
 Monitor the results
Act on data
Act on process
cost
benefits
action 1
action 3
action 2
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data quality – Act on data / process 6
 ACT ONLY ON DATA
 act directly on data
 easier but keep issues of poor quality
 Normalize void values, put business rules, regex
 Use master data (countries, postal code, PO sites, vat, address)
 Add integrity constraints in DB
 Re-build technical links between records
 ACT ON PROCESS
 modify process and eliminate issues of poor quality
 more difficult but more sustainable
 Bind quality control on each data source of the data
 Use master data
 Replace manual tasks by ETL jobs
 Clarify the process on the data, synchronization
 Look for the root cause : Who, What, Where, When, Why ; 5 Why with actors
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data quality – Act on process 7
 IDENTIFY POOR QUALITY
 human, process ~50%
 Human : typos, bad help, insufficient controls
 Bad data sources, bad transforms in ETL process
 data model, applicative ~30%
 MCD : not key to identify unique
 Applicative : bad business controls, bug, not usage of
master data
 Production ~20%
 Bad ILM : obsolescence, bad synchronization of data
lifecycle and treatments
 No master data, data in silos
 Other causes
 Migration of a system
 Supplier changes, reorganization
Monitoring by data source over time
Instant monitoring by metric, by data source
collect
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data quality – Act 8
 SOFTWARE

Mais conteúdo relacionado

Mais procurados

DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityCaserta
 
Data profiling-best-practices
Data profiling-best-practicesData profiling-best-practices
Data profiling-best-practicesBlaise Cheuteu
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeStefan Kühn
 
Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...Grid Dynamics
 
Sound Data Quality for CRM
Sound Data Quality for CRMSound Data Quality for CRM
Sound Data Quality for CRMDivya Malik
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBigDataExpo
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data qualityIUPUI
 
Foundation of data quality
Foundation of data qualityFoundation of data quality
Foundation of data qualityKhaled Mosharraf
 
Data quality management Basic
Data quality management BasicData quality management Basic
Data quality management BasicKhaled Mosharraf
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernAmin Chowdhury
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Denny Lee
 
Lecture 23
Lecture 23Lecture 23
Lecture 23Shani729
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsUmasree Raghunath
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...Alan D. Duncan
 
Database and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business IntelligenceDatabase and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business IntelligenceYeng Ferraris Portes
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsSSaudia
 

Mais procurados (20)

DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data Quality
 
Data profiling-best-practices
Data profiling-best-practicesData profiling-best-practices
Data profiling-best-practices
 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
 
Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...Dynamic Talks: "Implementing data quality automation with open source stack" ...
Dynamic Talks: "Implementing data quality automation with open source stack" ...
 
Data Quality Definitions
Data Quality DefinitionsData Quality Definitions
Data Quality Definitions
 
Sound Data Quality for CRM
Sound Data Quality for CRMSound Data Quality for CRM
Sound Data Quality for CRM
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data QualityBig Data Expo 2015 - Trillium software Big Data and the Data Quality
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data quality
 
Foundation of data quality
Foundation of data qualityFoundation of data quality
Foundation of data quality
 
Data quality management Basic
Data quality management BasicData quality management Basic
Data quality management Basic
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing Concern
 
Tamr overview
Tamr overviewTamr overview
Tamr overview
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
 
Data analytics vs. Data analysis
Data analytics vs. Data analysisData analytics vs. Data analysis
Data analytics vs. Data analysis
 
Lecture 23
Lecture 23Lecture 23
Lecture 23
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
 
Database and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business IntelligenceDatabase and Data Warehousing-Building Business Intelligence
Database and Data Warehousing-Building Business Intelligence
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 

Destaque

Data-Ed Online: Engineering Solutions to Data Quality Challenges
Data-Ed Online: Engineering Solutions to Data Quality ChallengesData-Ed Online: Engineering Solutions to Data Quality Challenges
Data-Ed Online: Engineering Solutions to Data Quality ChallengesData Blueprint
 
Data Management Lab: Session 3 Slides
Data Management Lab: Session 3 SlidesData Management Lab: Session 3 Slides
Data Management Lab: Session 3 SlidesIUPUI
 
Are Your Students Ready for Lab?
Are Your Students Ready for Lab?Are Your Students Ready for Lab?
Are Your Students Ready for Lab?Cengage Learning
 
Corporate Data Quality Management Research and Services Overview
Corporate Data Quality Management Research and Services OverviewCorporate Data Quality Management Research and Services Overview
Corporate Data Quality Management Research and Services OverviewBoris Otto
 
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Harald Erb
 
Highway Engineering Lab Protocol (Cycle-1)
Highway Engineering Lab Protocol (Cycle-1)Highway Engineering Lab Protocol (Cycle-1)
Highway Engineering Lab Protocol (Cycle-1)PENKI RAMU
 
Physics Lab Practical
Physics Lab PracticalPhysics Lab Practical
Physics Lab PracticalAkib Al Islam
 
Construction Materials Engineering and Testing
Construction Materials Engineering and TestingConstruction Materials Engineering and Testing
Construction Materials Engineering and Testingmecocca5
 
Science laboratory equipment
Science laboratory equipmentScience laboratory equipment
Science laboratory equipmentLauriz Aclan
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
 
Material Testing Lab Equipments
Material Testing Lab EquipmentsMaterial Testing Lab Equipments
Material Testing Lab EquipmentsNaveed Hussain
 
Graphical representation of data mohit verma
Graphical representation of data mohit verma Graphical representation of data mohit verma
Graphical representation of data mohit verma MOHIT KUMAR VERMA
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of datadrasifk
 
Graphical Representation of data
Graphical Representation of dataGraphical Representation of data
Graphical Representation of dataJijo K Mathew
 
Data Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataData Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataRoqui Malijan
 

Destaque (20)

Data-Ed Online: Engineering Solutions to Data Quality Challenges
Data-Ed Online: Engineering Solutions to Data Quality ChallengesData-Ed Online: Engineering Solutions to Data Quality Challenges
Data-Ed Online: Engineering Solutions to Data Quality Challenges
 
Data Management Lab: Session 3 Slides
Data Management Lab: Session 3 SlidesData Management Lab: Session 3 Slides
Data Management Lab: Session 3 Slides
 
Are Your Students Ready for Lab?
Are Your Students Ready for Lab?Are Your Students Ready for Lab?
Are Your Students Ready for Lab?
 
Corporate Data Quality Management Research and Services Overview
Corporate Data Quality Management Research and Services OverviewCorporate Data Quality Management Research and Services Overview
Corporate Data Quality Management Research and Services Overview
 
Big Data At A Human Scale
Big Data At A Human ScaleBig Data At A Human Scale
Big Data At A Human Scale
 
Data Quality Control
Data Quality ControlData Quality Control
Data Quality Control
 
Biology lab safety
Biology lab safety Biology lab safety
Biology lab safety
 
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
 
Highway Engineering Lab Protocol (Cycle-1)
Highway Engineering Lab Protocol (Cycle-1)Highway Engineering Lab Protocol (Cycle-1)
Highway Engineering Lab Protocol (Cycle-1)
 
Physics Lab Practical
Physics Lab PracticalPhysics Lab Practical
Physics Lab Practical
 
Construction Materials Engineering and Testing
Construction Materials Engineering and TestingConstruction Materials Engineering and Testing
Construction Materials Engineering and Testing
 
Science laboratory equipment
Science laboratory equipmentScience laboratory equipment
Science laboratory equipment
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
Lab safety rules and symbols Summary
Lab safety rules and symbols SummaryLab safety rules and symbols Summary
Lab safety rules and symbols Summary
 
Material Testing Lab Equipments
Material Testing Lab EquipmentsMaterial Testing Lab Equipments
Material Testing Lab Equipments
 
Graphical representation of data mohit verma
Graphical representation of data mohit verma Graphical representation of data mohit verma
Graphical representation of data mohit verma
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
 
Graphical Representation of data
Graphical Representation of dataGraphical Representation of data
Graphical Representation of data
 
Data Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataData Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of Data
 
Chapter 4 presentation of data
Chapter 4 presentation of dataChapter 4 presentation of data
Chapter 4 presentation of data
 

Semelhante a Data Quality in Big Data Management

( Big ) Data Management - Governance - Global concepts in 5 slides
( Big ) Data Management - Governance - Global concepts in 5 slides( Big ) Data Management - Governance - Global concepts in 5 slides
( Big ) Data Management - Governance - Global concepts in 5 slidesNicolas Sarramagna
 
( Big ) Data Management - Master Data - Global concepts in 10 slides
( Big ) Data Management - Master Data - Global concepts in 10 slides( Big ) Data Management - Master Data - Global concepts in 10 slides
( Big ) Data Management - Master Data - Global concepts in 10 slidesNicolas Sarramagna
 
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGMETA DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGIJCSEIT Journal
 
A simplified approach for quality management in data warehouse
A simplified approach for quality management in data warehouseA simplified approach for quality management in data warehouse
A simplified approach for quality management in data warehouseIJDKP
 
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...Nicolas Sarramagna
 
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture  yashowardhan sowaleCWIN17 India / Bigdata architecture  yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowaleCapgemini
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Techniques for effective test data management in test automation.pptx
Techniques for effective test data management in test automation.pptxTechniques for effective test data management in test automation.pptx
Techniques for effective test data management in test automation.pptxKnoldus Inc.
 
Characteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovationCharacteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovationCloverDX
 
What is Data Observability.pdf
What is Data Observability.pdfWhat is Data Observability.pdf
What is Data Observability.pdf4dalert
 
Data quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data qualityData quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data qualityJaveriaGauhar
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overviewdublinx
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2Mahmoud Alfarra
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...Big Data Week
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 

Semelhante a Data Quality in Big Data Management (20)

( Big ) Data Management - Governance - Global concepts in 5 slides
( Big ) Data Management - Governance - Global concepts in 5 slides( Big ) Data Management - Governance - Global concepts in 5 slides
( Big ) Data Management - Governance - Global concepts in 5 slides
 
( Big ) Data Management - Master Data - Global concepts in 10 slides
( Big ) Data Management - Master Data - Global concepts in 10 slides( Big ) Data Management - Master Data - Global concepts in 10 slides
( Big ) Data Management - Master Data - Global concepts in 10 slides
 
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGMETA DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
 
A simplified approach for quality management in data warehouse
A simplified approach for quality management in data warehouseA simplified approach for quality management in data warehouse
A simplified approach for quality management in data warehouse
 
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
 
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture  yashowardhan sowaleCWIN17 India / Bigdata architecture  yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowale
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Techniques for effective test data management in test automation.pptx
Techniques for effective test data management in test automation.pptxTechniques for effective test data management in test automation.pptx
Techniques for effective test data management in test automation.pptx
 
strategies.pdf
strategies.pdfstrategies.pdf
strategies.pdf
 
Characteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovationCharacteristics of modern data architecture that drive innovation
Characteristics of modern data architecture that drive innovation
 
do_dq.pdf
do_dq.pdfdo_dq.pdf
do_dq.pdf
 
What is Data Observability.pdf
What is Data Observability.pdfWhat is Data Observability.pdf
What is Data Observability.pdf
 
Data quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data qualityData quality testing – a quick checklist to measure and improve data quality
Data quality testing – a quick checklist to measure and improve data quality
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overview
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
 
Data quality
Data qualityData quality
Data quality
 
Data quality
Data qualityData quality
Data quality
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 

Último

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 

Último (20)

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 

Data Quality in Big Data Management

  • 1. ( Big ) Data Management Data Quality Global Concepts in 5 slides 2016 Nicolas SARRAMAGNA https://fr.linkedin.com/pub/nicolas-sarramagna/19/941/587
  • 3. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data Quality in Data Management MARCH 2015 3  DATA MANAGEMENT  Multiples modules  BIG DATA  Velocity, Volume, Variety, Veracity, Value Collect Storage Data Mining / Machine Learning Data Viz Governance Security Master Data Data quality
  • 4. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data quality – What 4  DATA QUALITY - VERACITY  Can be used for all data storage (not only in a product of Master Data Management)  Quality = metrics on your data  METRICS  Metrics can depend by data  EXAMPLES OF METRIC  Accuracy : exact representation of the data (ex: correct spelling). TABLE impact (ex : nb correct values / nb total)  Completeness : all fields are filled (ex : no void values). TABLE (ex : nb void values / nb total)  Conformity : respect specific format (ex : country code - regex). TABLE  Integrity : no missing links between records, no orphans. APPLI impact  Duplication : no unnecessary multiple representations of the same data. APPLI  Timeless : is data sufficiently up-ta-date (ex : actual time - time record saved in DB). APPLI  Consistency : matching of the data across applications. MULTI-APPLI impact
  • 5. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data quality – Why / How 5  DATA QUALITY  Needed for the Master Data module and build the ‘single version of truth’  Need to clean, fix data on application  Need to find the poor quality on the data processing  Poor quality = issues in applications, on the process, to grow, to build  Prerequisite to exploit data (garbage in -> garbage out)  APPLY A GLOBAL TREATMENT ON THE DATA  Plan Do Check Act approach  Focus on a perimeter of data  need process of the data, define metrics/KPI  Collect  Store  Data Quality  Identify poor quality on the process  Actions to improve it  put quality at the creation of the data  Monitor the results Act on data Act on process cost benefits action 1 action 3 action 2
  • 6. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data quality – Act on data / process 6  ACT ONLY ON DATA  act directly on data  easier but keep issues of poor quality  Normalize void values, put business rules, regex  Use master data (countries, postal code, PO sites, vat, address)  Add integrity constraints in DB  Re-build technical links between records  ACT ON PROCESS  modify process and eliminate issues of poor quality  more difficult but more sustainable  Bind quality control on each data source of the data  Use master data  Replace manual tasks by ETL jobs  Clarify the process on the data, synchronization  Look for the root cause : Who, What, Where, When, Why ; 5 Why with actors
  • 7. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data quality – Act on process 7  IDENTIFY POOR QUALITY  human, process ~50%  Human : typos, bad help, insufficient controls  Bad data sources, bad transforms in ETL process  data model, applicative ~30%  MCD : not key to identify unique  Applicative : bad business controls, bug, not usage of master data  Production ~20%  Bad ILM : obsolescence, bad synchronization of data lifecycle and treatments  No master data, data in silos  Other causes  Migration of a system  Supplier changes, reorganization Monitoring by data source over time Instant monitoring by metric, by data source collect
  • 8. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data quality – Act 8  SOFTWARE