SlideShare uma empresa Scribd logo
1 de 30
AN INTRODUCTION TO
DATA QUALITY SERVICES
koen verbeeck
BI consultant
WHO AM I
• BI consultant @ Ordina

• member of SQLUG.be

• MCTS, MCITP in SQL Server 2008

• working with Microsoft BI for over 2 years

• beer and comic books enthusiast

• married with children…
INTRODUCTION

data quality?
          Data are of high quality "if they are fit for their intended uses in
          operations, decision making and planning" (J. M. Juran).
          - Wikipedia on Data Quality


• achieved through people, technology & processes
• can be measured with various dimensions
  •   accuracy
  •   consistency
  •   completeness
  •   duplicates (uniqueness)
  •   timeliness
  •   validness
• bad data = bad business
INTRODUCTION


Data Quality   Issue                               Sample Data Problem

Standard       Are data elements consistently      Gender code = M, F, U in one system and Gender
               defined and understood?             code = 0, 1, 2 in another system


Complete       Is all necessary data present ?     20% of customers’ last name is blank,
                                                   50% of zip-codes are 99999

Accurate       Does the data accurately            A supplier is listed as ‘Active’ but went out of
               represent reality or a verifiable   business six years ago
               source?

Valid          Do data values fall within          Temperature recordings should be between
               acceptable ranges?                  -100°C and +100°C

Unique         Data appears several times          Prince, The Artist formerly known as Prince, The
                                                   Artist, … are they the same person?
INTRODUCTION

Monitoring                                           Cleansing
Tracking and monitoring                              Amend, remove or enrich
the state of Quality                                 data that is incorrect or
activities and Quality                               incomplete. This includes
of Data                                              correction, standardization
                                                     and enrichment.

                            Monitoring   Cleansing




                             Profiling   Matching
Profiling
                                                     Matching
Analysis of the data
                                                     Identifying, linking or
source to provide insight
                                                     merging related entries
into the quality of the
                                                     within or across sets of data.
data and help to identify
data quality issues.
OUTLINE
• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion
OVERVIEW OF DQS




      Data Quality Services (DQS) is a
Knowledge-Driven data quality solution,
enabling IT Pros and data stewards to easily
     improve the quality of their data
OVERVIEW OF DQS

Knowledge-
              Based on a Data Quality Knowledge Base (DQKB)
  Driven


 Semantics    Data Domains capture the semantics of your data


Knowledge
              Acquires additional knowledge the more you use it
 Discovery

 Open and     Support use of user-generated knowledge and IP
 Extendible   by 3rd party reference data providers


              Compelling user experience designed for increased
Easy to use   productivity
OVERVIEW OF DQS
• easy installation
  • pre-installation checks
    o SQL Server 2012 database engine (server)
    o .NET 4.0 & IE 6.0 or higher (client)


  • installation of DQS using SQL Server set-up




  • post-installation tasks
    o run DQSInstaller.exe
    o grant DQS roles to users
    o enable TCP/IP
OUTLINE
• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion
BUILDING A KNOWLEDGE BASE

                                                    Knowledge
                                                    Management
Build           Discover / Explore Data / Connect




 Integrated                                     Knowledge
 Profiling
                                                  Base


Use
                                                     DQ Projects
BUILDING A KNOWLEDGE BASE



                  Values



                                                        Composite
                                                         Domains
               Domains
               Represent
 3rd party   the data type
Reference
   Data                                  Domains   Knowledge
                              Rules &                Base
                             Relations


                                                           Matching
                                                            Policy
DEMO
• our first knowledge base
Z85HVQ4
BUILDING A KNOWLEDGE BASE
• iterative process
• knowledge discovery
  • gather knowledge from
    o Excel
    o SQL Server


  • profiling of data
    o not the same as SSIS profiling task!


  • automatically detects anomalies
BUILDING A KNOWLEDGE BASE
• domain management
  • knowledge about fields is kept in domains

  • data steward can
    o   create rules
    o   assign synonyms and corrections
    o   create term based relations (str.  street)
    o   link domains together into
        composite domains


  • import knowledge from
    o reference data (e.g. Azure Marketplace)
    o other knowledge bases
OUTLINE
• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion
DATA CLEANSING & MATCHING
• cleansing                                              •   St. --> street (corrected)
  • why?                                                 •   Microsot --> Microsoft (corrected)
   o identifies incomplete or incorrect data             •   john.doe@hotmail (invalid)
   o standardizes and enriches data by using             •   0472/34672 (invalid)
     domain values, domain rules and reference data
                                                         •   Verbeek --> Verbeeck (suggested)

  • DQS cleansing
   o create a knowledge base or select an existing one
   o create a data quality project
   o 2-step process
     – computer assisted cleansing
     – interactive cleansing
   o export results
DATA CLEANSING & MATCHING
• matching                                            •   Prince
                                                           •    The Artist Formerly Known
 • why?                                                    •
                                                                As Prince
                                                                The Artist
   o identify duplicates with the data source
                                                           •
   o create consolidated view of data

                                                      •   Jon Doe, High Street 13, NY,
 • DQS matching                                           doe@gmail.com
   o build a matching policy in KB                        John Doe, High Str, NY,
   o matching training                                    doe@gmail.com
   o create matching project
   o choose survivors
                          DQ Client – Match Results
DEMO
• cleanse data
• use a matching policy to find
  duplicates
DATA CLEANSING & MATCHING
• create a cleansing project
  • uses knowledge gathered in a DQS knowledge base

  • simple user-friendly process

  • profile results
DATA CLEANSING & MATCHING
• create a matching project
  • uses a matching policy created
    in a knowledge base

  • eliminates duplicates

  • profile results

  • the more knowledge that is added the better results will be
    o tip: clean-up the data first using a cleansing project


  • choose survivors at the end

  • export results into .csv
    or SQL Server
OUTLINE
• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion
SSIS INTEGRATION                                  SSIS Data Flow




        Knowledge
        Base
                               SSIS Package
                    Source +   Data correction
 Values/Rules       Mapping     Component        Destination


Reference Data
  Definition
DEMO
• an SSIS cleansing project
SSIS INTEGRATION
• cleaning as a batch process

• only cleaning, matching is (not yet?) possible

• composite domains are supported
OUTLINE
• introduction

• overview of data quality services

• building a knowledge base

• data cleansing & matching

• SSIS integration

• conclusion
CONCLUSION



Knowledge-driven              Easy To Use                    Open & Extendible
  Rich Knowledge Base          Focus on productivity and      Focus on cloud-based
  Continuous improvement       user experience                Reference Data
  and knowledge acquisition    Designed for business users    User-generated knowledge
  Build once, reuse for        Out-of-the-box knowledge       Integration with SSIS
  multiple DQ improvements
RESOURCES
• DQS Team Blog @ MSDN
  http://blogs.msdn.com/b/dqs/

• DQS documentation @ MSDN
  http://msdn.microsoft.com/en-us/library/ff877917(v=sql.110).aspx

• SQL Server 2012 Resource Center (nice How-To videos)
  http://msdn.microsoft.com/en-us/sqlserver/ff898410.aspx

• DQS Forum @ MSDN
  http://social.msdn.microsoft.com/Forums/en-
  US/sqldataqualityservices/threads

• TechEd presentation about DQS by Elad Ziklik
  http://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI207
THE END
thanks for watching!

Mais conteúdo relacionado

Destaque

CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)
CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)
CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)Daniel Cai
 
Sql server 2008 interview questions answers
Sql server 2008 interview questions answersSql server 2008 interview questions answers
Sql server 2008 interview questions answersJitendra Gangwar
 
Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014Boris Hristov
 
Sql server 2012 dba online training
Sql server 2012 dba online trainingSql server 2012 dba online training
Sql server 2012 dba online trainingsqlmasters
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 Richie Rump
 
Business Redefined – Managing Information Explosion, Data Quality and Compliance
Business Redefined – Managing Information Explosion, Data Quality and ComplianceBusiness Redefined – Managing Information Explosion, Data Quality and Compliance
Business Redefined – Managing Information Explosion, Data Quality and ComplianceCapgemini
 
70-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 201270-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 2012siphocha
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Stéphane Fréchette
 
Microsoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the JobMicrosoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the JobSenturus
 
Best MCSA - SQL SERVER 2012 Training Institute in Delhi
Best MCSA - SQL SERVER 2012 Training Institute in DelhiBest MCSA - SQL SERVER 2012 Training Institute in Delhi
Best MCSA - SQL SERVER 2012 Training Institute in DelhiInformation Technology
 

Destaque (11)

CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)
CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)
CRM magic with data migration & integration (Presentation at CRMUG Summit 2013)
 
Sql server 2008 interview questions answers
Sql server 2008 interview questions answersSql server 2008 interview questions answers
Sql server 2008 interview questions answers
 
Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014
 
Sql server 2012 dba online training
Sql server 2012 dba online trainingSql server 2012 dba online training
Sql server 2012 dba online training
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
 
Business Redefined – Managing Information Explosion, Data Quality and Compliance
Business Redefined – Managing Information Explosion, Data Quality and ComplianceBusiness Redefined – Managing Information Explosion, Data Quality and Compliance
Business Redefined – Managing Information Explosion, Data Quality and Compliance
 
70-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 201270-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 2012
 
Good sql server interview_questions
Good sql server interview_questionsGood sql server interview_questions
Good sql server interview_questions
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012
 
Microsoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the JobMicrosoft for BI and DW: Using the Right Tool for the Job
Microsoft for BI and DW: Using the Right Tool for the Job
 
Best MCSA - SQL SERVER 2012 Training Institute in Delhi
Best MCSA - SQL SERVER 2012 Training Institute in DelhiBest MCSA - SQL SERVER 2012 Training Institute in Delhi
Best MCSA - SQL SERVER 2012 Training Institute in Delhi
 

Mais de Microsoft TechNet - Belgium and Luxembourg

Mais de Microsoft TechNet - Belgium and Luxembourg (20)

Windows 10: all you need to know!
Windows 10: all you need to know!Windows 10: all you need to know!
Windows 10: all you need to know!
 
Configuration Manager 2012 – Compliance Settings 101 - Tim de Keukelaere
Configuration Manager 2012 – Compliance Settings 101 - Tim de KeukelaereConfiguration Manager 2012 – Compliance Settings 101 - Tim de Keukelaere
Configuration Manager 2012 – Compliance Settings 101 - Tim de Keukelaere
 
Windows 8.1 a closer look
Windows 8.1 a closer lookWindows 8.1 a closer look
Windows 8.1 a closer look
 
So you’ve successfully installed SCOM… Now what.
So you’ve successfully installed SCOM… Now what.So you’ve successfully installed SCOM… Now what.
So you’ve successfully installed SCOM… Now what.
 
Data Leakage Prevention
Data Leakage PreventionData Leakage Prevention
Data Leakage Prevention
 
Deploying and managing ConfigMgr Clients
Deploying and managing ConfigMgr ClientsDeploying and managing ConfigMgr Clients
Deploying and managing ConfigMgr Clients
 
Self Service BI anno 2013 – Where Do We Come From and Where Are We Going?
Self Service BI anno 2013 – Where Do We Come From and Where Are We Going?Self Service BI anno 2013 – Where Do We Come From and Where Are We Going?
Self Service BI anno 2013 – Where Do We Come From and Where Are We Going?
 
Hands on with Hyper-V Clustering Maintenance Mode & Cluster Aware Updating
Hands on with Hyper-V Clustering Maintenance Mode & Cluster Aware UpdatingHands on with Hyper-V Clustering Maintenance Mode & Cluster Aware Updating
Hands on with Hyper-V Clustering Maintenance Mode & Cluster Aware Updating
 
SCEP 2012 inside SCCM 2012
SCEP 2012 inside SCCM 2012SCEP 2012 inside SCCM 2012
SCEP 2012 inside SCCM 2012
 
Jump start your application monitoring with APM
Jump start your application monitoring with APMJump start your application monitoring with APM
Jump start your application monitoring with APM
 
What’s new in Lync Server 2013: Persistent Chat
What’s new in Lync Server 2013: Persistent ChatWhat’s new in Lync Server 2013: Persistent Chat
What’s new in Lync Server 2013: Persistent Chat
 
What's new for Lync 2013 Clients & Devices
What's new for Lync 2013 Clients & DevicesWhat's new for Lync 2013 Clients & Devices
What's new for Lync 2013 Clients & Devices
 
Office 365 ProPlus: Click-to-run deployment and management
Office 365 ProPlus: Click-to-run deployment and managementOffice 365 ProPlus: Click-to-run deployment and management
Office 365 ProPlus: Click-to-run deployment and management
 
Office 365 Identity Management options
Office 365 Identity Management options Office 365 Identity Management options
Office 365 Identity Management options
 
SharePoint Installation and Upgrade: Untangling Your Options
SharePoint Installation and Upgrade: Untangling Your Options SharePoint Installation and Upgrade: Untangling Your Options
SharePoint Installation and Upgrade: Untangling Your Options
 
The application model in real life
The application model in real lifeThe application model in real life
The application model in real life
 
Microsoft private cloud with Cisco and Netapp - Flexpod solution
Microsoft private cloud with Cisco and Netapp -  Flexpod solutionMicrosoft private cloud with Cisco and Netapp -  Flexpod solution
Microsoft private cloud with Cisco and Netapp - Flexpod solution
 
Managing Windows RT devices in the Enterprise
Managing Windows RT devices in the Enterprise Managing Windows RT devices in the Enterprise
Managing Windows RT devices in the Enterprise
 
Moving from Device Centric to a User Centric Management
Moving from Device Centric to a User Centric Management Moving from Device Centric to a User Centric Management
Moving from Device Centric to a User Centric Management
 
Network Management in System Center 2012 SP1 - VMM
Network Management in System Center 2012  SP1 - VMM Network Management in System Center 2012  SP1 - VMM
Network Management in System Center 2012 SP1 - VMM
 

Último

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

An introduction to Data Quality Services (DQS)

  • 1. AN INTRODUCTION TO DATA QUALITY SERVICES koen verbeeck BI consultant
  • 2. WHO AM I • BI consultant @ Ordina • member of SQLUG.be • MCTS, MCITP in SQL Server 2008 • working with Microsoft BI for over 2 years • beer and comic books enthusiast • married with children…
  • 3. INTRODUCTION data quality? Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). - Wikipedia on Data Quality • achieved through people, technology & processes • can be measured with various dimensions • accuracy • consistency • completeness • duplicates (uniqueness) • timeliness • validness • bad data = bad business
  • 4. INTRODUCTION Data Quality Issue Sample Data Problem Standard Are data elements consistently Gender code = M, F, U in one system and Gender defined and understood? code = 0, 1, 2 in another system Complete Is all necessary data present ? 20% of customers’ last name is blank, 50% of zip-codes are 99999 Accurate Does the data accurately A supplier is listed as ‘Active’ but went out of represent reality or a verifiable business six years ago source? Valid Do data values fall within Temperature recordings should be between acceptable ranges? -100°C and +100°C Unique Data appears several times Prince, The Artist formerly known as Prince, The Artist, … are they the same person?
  • 5. INTRODUCTION Monitoring Cleansing Tracking and monitoring Amend, remove or enrich the state of Quality data that is incorrect or activities and Quality incomplete. This includes of Data correction, standardization and enrichment. Monitoring Cleansing Profiling Matching Profiling Matching Analysis of the data Identifying, linking or source to provide insight merging related entries into the quality of the within or across sets of data. data and help to identify data quality issues.
  • 6. OUTLINE • introduction • overview of data quality services • building a knowledge base • data cleansing & matching • SSIS integration • conclusion
  • 7. OVERVIEW OF DQS Data Quality Services (DQS) is a Knowledge-Driven data quality solution, enabling IT Pros and data stewards to easily improve the quality of their data
  • 8. OVERVIEW OF DQS Knowledge- Based on a Data Quality Knowledge Base (DQKB) Driven Semantics Data Domains capture the semantics of your data Knowledge Acquires additional knowledge the more you use it Discovery Open and Support use of user-generated knowledge and IP Extendible by 3rd party reference data providers Compelling user experience designed for increased Easy to use productivity
  • 9. OVERVIEW OF DQS • easy installation • pre-installation checks o SQL Server 2012 database engine (server) o .NET 4.0 & IE 6.0 or higher (client) • installation of DQS using SQL Server set-up • post-installation tasks o run DQSInstaller.exe o grant DQS roles to users o enable TCP/IP
  • 10. OUTLINE • introduction • overview of data quality services • building a knowledge base • data cleansing & matching • SSIS integration • conclusion
  • 11. BUILDING A KNOWLEDGE BASE Knowledge Management Build Discover / Explore Data / Connect Integrated Knowledge Profiling Base Use DQ Projects
  • 12. BUILDING A KNOWLEDGE BASE Values Composite Domains Domains Represent 3rd party the data type Reference Data Domains Knowledge Rules & Base Relations Matching Policy
  • 13. DEMO • our first knowledge base
  • 15. BUILDING A KNOWLEDGE BASE • iterative process • knowledge discovery • gather knowledge from o Excel o SQL Server • profiling of data o not the same as SSIS profiling task! • automatically detects anomalies
  • 16. BUILDING A KNOWLEDGE BASE • domain management • knowledge about fields is kept in domains • data steward can o create rules o assign synonyms and corrections o create term based relations (str.  street) o link domains together into composite domains • import knowledge from o reference data (e.g. Azure Marketplace) o other knowledge bases
  • 17. OUTLINE • introduction • overview of data quality services • building a knowledge base • data cleansing & matching • SSIS integration • conclusion
  • 18. DATA CLEANSING & MATCHING • cleansing • St. --> street (corrected) • why? • Microsot --> Microsoft (corrected) o identifies incomplete or incorrect data • john.doe@hotmail (invalid) o standardizes and enriches data by using • 0472/34672 (invalid) domain values, domain rules and reference data • Verbeek --> Verbeeck (suggested) • DQS cleansing o create a knowledge base or select an existing one o create a data quality project o 2-step process – computer assisted cleansing – interactive cleansing o export results
  • 19. DATA CLEANSING & MATCHING • matching • Prince • The Artist Formerly Known • why? • As Prince The Artist o identify duplicates with the data source • o create consolidated view of data • Jon Doe, High Street 13, NY, • DQS matching doe@gmail.com o build a matching policy in KB John Doe, High Str, NY, o matching training doe@gmail.com o create matching project o choose survivors DQ Client – Match Results
  • 20. DEMO • cleanse data • use a matching policy to find duplicates
  • 21. DATA CLEANSING & MATCHING • create a cleansing project • uses knowledge gathered in a DQS knowledge base • simple user-friendly process • profile results
  • 22. DATA CLEANSING & MATCHING • create a matching project • uses a matching policy created in a knowledge base • eliminates duplicates • profile results • the more knowledge that is added the better results will be o tip: clean-up the data first using a cleansing project • choose survivors at the end • export results into .csv or SQL Server
  • 23. OUTLINE • introduction • overview of data quality services • building a knowledge base • data cleansing & matching • SSIS integration • conclusion
  • 24. SSIS INTEGRATION SSIS Data Flow Knowledge Base SSIS Package Source + Data correction Values/Rules Mapping Component Destination Reference Data Definition
  • 25. DEMO • an SSIS cleansing project
  • 26. SSIS INTEGRATION • cleaning as a batch process • only cleaning, matching is (not yet?) possible • composite domains are supported
  • 27. OUTLINE • introduction • overview of data quality services • building a knowledge base • data cleansing & matching • SSIS integration • conclusion
  • 28. CONCLUSION Knowledge-driven Easy To Use Open & Extendible Rich Knowledge Base Focus on productivity and Focus on cloud-based Continuous improvement user experience Reference Data and knowledge acquisition Designed for business users User-generated knowledge Build once, reuse for Out-of-the-box knowledge Integration with SSIS multiple DQ improvements
  • 29. RESOURCES • DQS Team Blog @ MSDN http://blogs.msdn.com/b/dqs/ • DQS documentation @ MSDN http://msdn.microsoft.com/en-us/library/ff877917(v=sql.110).aspx • SQL Server 2012 Resource Center (nice How-To videos) http://msdn.microsoft.com/en-us/sqlserver/ff898410.aspx • DQS Forum @ MSDN http://social.msdn.microsoft.com/Forums/en- US/sqldataqualityservices/threads • TechEd presentation about DQS by Elad Ziklik http://channel9.msdn.com/Events/TechEd/NorthAmerica/2011/DBI207
  • 30. THE END thanks for watching!