SlideShare a Scribd company logo
1 of 3
Download to read offline
Data Quality Solutions & Bad Data: A Case of Misplaced Confidence?
copyright Real-Time Technology Solutions, Inc. November, 2015 page 1
Nov 5, 2015
We all know that C-level executives are making strategic decisions based on information from their BI and
analytics initiatives to try to provide their firms with a competitive advantage. But what if the data is incorrect?
Then that means they are making big bets, impacting the company's direction and future, on analyses that have
underlying data that is incorrect or is bad data.
I was reading some interesting articles on big data, data warehousing and data quality and came across these
interesting statistics:
So why is there a disconnect between the first quote and the next four quotes? If 90% of US companies are
implementing some form of Data Quality solution, why are so many companies experiencing bad data issues?
Data Quality vs. Data Testing
In digging deeper, it becomes clear when you look at the characteristics of data quality
tools. Below are characteristics from Gartner’s 2014 Magic Quadrant for Data Quality
Tools:
 Profiling: analysis of data to capture statistics (metadata)
 Parsing and standardization: decompose text fields into components, formatting based on standards and
business rules
Data Quality Solutions
&
Bad Data:
A Case of Misplaced
Confidence?
“90% percent of U.S.
companies have
some sort of data
quality solution in
place today”
- Experian Data Quality
“The average
organization loses
$8.2 million annually
through poor Data
Quality."
- Gartner
“On average, U.S.
organizations
believe 32% of their
data is inaccurate”
– Experian Data Quality
“46% of companies
cite data quality as a
barrier for adopting
Business
Intelligence
products”
- InformationWeek
“Poor data quality
is a primary reason
for 40% of all
business initiatives
failing to achieve
their targeted
benefits”
- Gartner
Data Quality Solutions & Bad Data: A Case of Misplaced Confidence?
copyright Real-Time Technology Solutions, Inc. November, 2015 page 2
 Generalized "cleansing": modification of data values to meet domain restrictions, integrity constraints or
other business rules
 Matching: identifying, linking or merging related entries within or across sets of data
 Monitoring: deploying controls to ensure that data continues to conform to business rules
 Enrichment: enhancing the value of data by appending consumer demographics & geography
 Subject-area-specific support: standardization capabilities for specific data subject areas
 Metadata management: ability to capture, reconcile & correlate metadata related to quality process
 Configuration environment: capabilities for creating, managing and deploying data quality rules
So while data quality software is incredibly important, none of the above characteristics specifically deal with data
validation from source files, databases, xml and other data sources through the transformation process to the target
Data Warehouse or Big Data store.
Data testing is completely different. According to the book "Testing the Data Warehouse Practicum" by Doug Vucevic
and Wayne Yaddow, the primary goals of data testing are:
 Data Completeness: Verifying that all data has been loaded from the sources
to the target DWH
 Data Transformation: Ensuring that all data has been transformed correctly
during the Extract-Transform-Load (ETL) process
 Data Quality: Ensuring that the ETL process correctly rejects, substitutes
default values, corrects or ignores and reports invalid data
 Regression Testing: Testing existing functionality again to ensure it remains
intact for new release
Data Testing Methods
Many companies currently perform data testing, data validation and reconciliation, knowing their importance. The
problem is that for all of the advances made in the software space in big data, data warehouses and databases, the
process of data testing is still a manual one that is loaded with risk and ripe for producing massive amounts of bad data.
The 2 most prevalent methods used for data testing are:
 Sampling (also known as "Stare and Compare") – The
tester writes SQL to extract data from the source data
and from the target data warehouse or big data store,
dumps the 2 result sets into Excel and performs
“stare and compare”, meaning verifying the data by
viewing or “eyeballing” the results. Since 1 test query
can return as much as 200 million rows with 200
columns (40 billion data sets), and most test teams have hundreds of these tests, this method proves
impossible to validate more than a fraction of 1% of data and thus cannot be counted on the find data
errors.
 Minus Queries - Using the MINUS method, the tester queries the source data and the target data and
subtracts the 1st result set from the 2nd
set to determine the result set difference. If there is no
difference, there is no remaining result set. Then this MINUS is performed again, subtracting the 2nd
Data Quality Solutions & Bad Data: A Case of Misplaced Confidence?
copyright Real-Time Technology Solutions, Inc. November, 2015 page 3
set from the 1st set (see example here). This has its value, but potential issues are (a) the result sets
may not be accurate when dealing with duplicate rows, (b) this method does not produce historical data
& reports, which is a concern for audit and regulatory reviews, and (c) processing MINUS queries puts
pressure on the servers.
These manual processes are tedious and inefficient, providing limited coverage of data validation and leaving the
probability of bad data in these data stores and thus allowing for bad data to exist in the BI and Analytics reports.
Automated Data Testing solutions to the rescue
But there is help out there. A new sector of software vendors has been popping up to fill the need for automated data
testing. Led by RTTS' QuerySurge, these testing solutions can provide automated comparisons of upwards of 100% of all
data movement quickly, which leads to improved data quality, a reduction in data costs & bad data risks, shared data
health information, and significant return on investment.
So while data quality tools are an important part of the data solution, data testing compliments the data health picture
and provides C-level executives and their teams with the confidence that the strategic, potentially game-changing
decisions they are making are done so with validated, accurate data.
About QuerySurge
QuerySurge is the software division of RTTS.
RTTS’ team of test experts developed QuerySurge™ to address the unique testing needs in the
Big Data and Data Warehousing spaces. QuerySurge is the leading Data Testing solution built
specifically to automate the testing of Data Warehouses & Big Data. QuerySurge makes it really
easy for both novice and experienced team members to validate their organization's data
quickly, analyzing and pinpointing up to 100% of all data differences while providing both real-
time and historical views of your data’s health.
To find the answer to “What is QuerySurge?” click here>
To decide which trial version of QuerySurge fits your needs, click here>
To see recent case studies on QuerySurge, click here>

More Related Content

More from RTTS

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsRTTS
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinarRTTS
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingRTTS
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentRTTS
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdfRTTS
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP TestingRTTS
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure CloudRTTS
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyRTTS
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectRTTS
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinarRTTS
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryRTTS
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessRTTS
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOpsRTTS
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverRTTS
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumRTTS
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 

More from RTTS (20)

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinar
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdf
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriver
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality Experts
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 

Recently uploaded

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456KiaraTiradoMicha
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 

Recently uploaded (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

Data Quality Solutions and Bad Data

  • 1. Data Quality Solutions & Bad Data: A Case of Misplaced Confidence? copyright Real-Time Technology Solutions, Inc. November, 2015 page 1 Nov 5, 2015 We all know that C-level executives are making strategic decisions based on information from their BI and analytics initiatives to try to provide their firms with a competitive advantage. But what if the data is incorrect? Then that means they are making big bets, impacting the company's direction and future, on analyses that have underlying data that is incorrect or is bad data. I was reading some interesting articles on big data, data warehousing and data quality and came across these interesting statistics: So why is there a disconnect between the first quote and the next four quotes? If 90% of US companies are implementing some form of Data Quality solution, why are so many companies experiencing bad data issues? Data Quality vs. Data Testing In digging deeper, it becomes clear when you look at the characteristics of data quality tools. Below are characteristics from Gartner’s 2014 Magic Quadrant for Data Quality Tools:  Profiling: analysis of data to capture statistics (metadata)  Parsing and standardization: decompose text fields into components, formatting based on standards and business rules Data Quality Solutions & Bad Data: A Case of Misplaced Confidence? “90% percent of U.S. companies have some sort of data quality solution in place today” - Experian Data Quality “The average organization loses $8.2 million annually through poor Data Quality." - Gartner “On average, U.S. organizations believe 32% of their data is inaccurate” – Experian Data Quality “46% of companies cite data quality as a barrier for adopting Business Intelligence products” - InformationWeek “Poor data quality is a primary reason for 40% of all business initiatives failing to achieve their targeted benefits” - Gartner
  • 2. Data Quality Solutions & Bad Data: A Case of Misplaced Confidence? copyright Real-Time Technology Solutions, Inc. November, 2015 page 2  Generalized "cleansing": modification of data values to meet domain restrictions, integrity constraints or other business rules  Matching: identifying, linking or merging related entries within or across sets of data  Monitoring: deploying controls to ensure that data continues to conform to business rules  Enrichment: enhancing the value of data by appending consumer demographics & geography  Subject-area-specific support: standardization capabilities for specific data subject areas  Metadata management: ability to capture, reconcile & correlate metadata related to quality process  Configuration environment: capabilities for creating, managing and deploying data quality rules So while data quality software is incredibly important, none of the above characteristics specifically deal with data validation from source files, databases, xml and other data sources through the transformation process to the target Data Warehouse or Big Data store. Data testing is completely different. According to the book "Testing the Data Warehouse Practicum" by Doug Vucevic and Wayne Yaddow, the primary goals of data testing are:  Data Completeness: Verifying that all data has been loaded from the sources to the target DWH  Data Transformation: Ensuring that all data has been transformed correctly during the Extract-Transform-Load (ETL) process  Data Quality: Ensuring that the ETL process correctly rejects, substitutes default values, corrects or ignores and reports invalid data  Regression Testing: Testing existing functionality again to ensure it remains intact for new release Data Testing Methods Many companies currently perform data testing, data validation and reconciliation, knowing their importance. The problem is that for all of the advances made in the software space in big data, data warehouses and databases, the process of data testing is still a manual one that is loaded with risk and ripe for producing massive amounts of bad data. The 2 most prevalent methods used for data testing are:  Sampling (also known as "Stare and Compare") – The tester writes SQL to extract data from the source data and from the target data warehouse or big data store, dumps the 2 result sets into Excel and performs “stare and compare”, meaning verifying the data by viewing or “eyeballing” the results. Since 1 test query can return as much as 200 million rows with 200 columns (40 billion data sets), and most test teams have hundreds of these tests, this method proves impossible to validate more than a fraction of 1% of data and thus cannot be counted on the find data errors.  Minus Queries - Using the MINUS method, the tester queries the source data and the target data and subtracts the 1st result set from the 2nd set to determine the result set difference. If there is no difference, there is no remaining result set. Then this MINUS is performed again, subtracting the 2nd
  • 3. Data Quality Solutions & Bad Data: A Case of Misplaced Confidence? copyright Real-Time Technology Solutions, Inc. November, 2015 page 3 set from the 1st set (see example here). This has its value, but potential issues are (a) the result sets may not be accurate when dealing with duplicate rows, (b) this method does not produce historical data & reports, which is a concern for audit and regulatory reviews, and (c) processing MINUS queries puts pressure on the servers. These manual processes are tedious and inefficient, providing limited coverage of data validation and leaving the probability of bad data in these data stores and thus allowing for bad data to exist in the BI and Analytics reports. Automated Data Testing solutions to the rescue But there is help out there. A new sector of software vendors has been popping up to fill the need for automated data testing. Led by RTTS' QuerySurge, these testing solutions can provide automated comparisons of upwards of 100% of all data movement quickly, which leads to improved data quality, a reduction in data costs & bad data risks, shared data health information, and significant return on investment. So while data quality tools are an important part of the data solution, data testing compliments the data health picture and provides C-level executives and their teams with the confidence that the strategic, potentially game-changing decisions they are making are done so with validated, accurate data. About QuerySurge QuerySurge is the software division of RTTS. RTTS’ team of test experts developed QuerySurge™ to address the unique testing needs in the Big Data and Data Warehousing spaces. QuerySurge is the leading Data Testing solution built specifically to automate the testing of Data Warehouses & Big Data. QuerySurge makes it really easy for both novice and experienced team members to validate their organization's data quickly, analyzing and pinpointing up to 100% of all data differences while providing both real- time and historical views of your data’s health. To find the answer to “What is QuerySurge?” click here> To decide which trial version of QuerySurge fits your needs, click here> To see recent case studies on QuerySurge, click here>