SlideShare a Scribd company logo
1 of 26
Kokila Rudresh
Shalini Saini
DataAnalytics–TestingSpectrum
V o d Q A 2 0 1 6
Data Analytics: An Introduction
Collection
Processing Modelling Inference Visualization
Data Analytics: Use Cases
Business Intelligence
Social Networks
Astronomy and
Astrophysics
Finance and Stock
Market Medical Imaging
Computer Graphics
Computer Vision
Energy ExplorationMaps Retail
Data Analytics: Why Testing is Important
Volume
Domain
Complexity
Variety
Computations
Testing
Data Analytics: Testing Challenges
Data
Validation
Model
Implementation
Business
Perspective
Data Analytics: Typical System Implementation
Extract
Transform
Load
Source
Data
Modelling AggregationETL VisualizationRaw Data
Source Data
Extract
Transform
Load
Source
Data
ETL Process
Extract
Transform
Load
Source
Data
Modelling
Extract
Transform
Load
Source
Data
Aggregation
Extract
Transform
Load
Source
Data
Visualization
Extract
Transform
Load
Source
Data
Data Analytics Testing - Approach
Extract
Transform
Load
Source
Data
Pre-ETL
Validations
Post-ETL
Tests
Model
Validations
Aggregation
Validations
Visualization
Validations
Format
Consistency
Completeness
Data Analytics - Testing
Extract
Transform
Load
Source
Data
Pre-ETL Validations
Pre ETL Testing
Data Analytics - Testing
Extract
Transform
Load
Source
Data
Post-ETL Tests
Meta-data
Data transformation
Data quality checks
Business-specific validations
Post ETL Testing
Data Analytics - Testing
Extract
Transform
Load
Source
Data
Model Validations
Implementation
Computation
Model Implementation Testing
Sales = a(Seasonality) + b(Trend) + c(Promotions) + d(Sales Channel) + other factors
Data Analytics - Testing
Extract
Transform
Load
Source
Data
Aggregation Validations
Data Hierarchy
Data Scope
Summarized Values
Data Analytics - Testing
Extract
Transform
Load
Source
Data
Visualization
Validations
Information Representation
Data Format
Result Intuitiveness
Visualization Testing
Learnings
ANALYSE
CODETEST
Initial Data Flow
• Pre defined data
template
• Pre-ETL data validations
Domain Knowledge
• KT Sessions involving SME’s
• Core computations
Business Involvement
• Test data closer to real
time data
• User flows prioritization
Learnings
Implementation
• Alternate implementation
• SME validation`
Computation
• Addressing the right
problem
• Computational Factors
ANALYSE
CODETEST
Learnings
Testing Process
• Step wise data
validation
• Defect investigation
Test Automation
• Data combinations
• Xml test data
Test Execution
• CI test execution
• Execution frequency
Testing Tools
• Spreadsheet gear
• Excel macros
ANALYSE
CODETEST
Domain
Context
Integrating
Business
Use-cases
Design and
Testing
Challenges
Testing
Approach
Learnings
Summary
kokila@thoughtworks.com
sshalini@thoughtworks.com

More Related Content

What's hot

Data Analytics Strategy
Data Analytics StrategyData Analytics Strategy
Data Analytics Strategy
eHealthCareers
 

What's hot (20)

Anonymizing Health Data
Anonymizing Health DataAnonymizing Health Data
Anonymizing Health Data
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
Ibm data governance framework
Ibm data governance frameworkIbm data governance framework
Ibm data governance framework
 
CS6010 Social Network Analysis Unit III
CS6010 Social Network Analysis   Unit IIICS6010 Social Network Analysis   Unit III
CS6010 Social Network Analysis Unit III
 
Data strategy demistifying data
Data strategy demistifying dataData strategy demistifying data
Data strategy demistifying data
 
HIE technical infrastructure
HIE technical infrastructureHIE technical infrastructure
HIE technical infrastructure
 
Qlik-Sense-Product-Presentation.compressed.pdf
Qlik-Sense-Product-Presentation.compressed.pdfQlik-Sense-Product-Presentation.compressed.pdf
Qlik-Sense-Product-Presentation.compressed.pdf
 
Data migration
Data migrationData migration
Data migration
 
Data Analytics Strategy
Data Analytics StrategyData Analytics Strategy
Data Analytics Strategy
 
Data visualization
Data visualizationData visualization
Data visualization
 
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptxNeo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
Neo4j & AWS Bedrock workshop at GraphSummit London 14 Nov 2023.pptx
 
ETL
ETLETL
ETL
 
adb.pdf
adb.pdfadb.pdf
adb.pdf
 
Building a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsBuilding a Winning Roadmap for Analytics
Building a Winning Roadmap for Analytics
 
Data Preparation Fundamentals
Data Preparation FundamentalsData Preparation Fundamentals
Data Preparation Fundamentals
 
A Real World Case Study for Implementing an Enterprise Scale Data Fabric
A Real World Case Study for Implementing an Enterprise Scale Data FabricA Real World Case Study for Implementing an Enterprise Scale Data Fabric
A Real World Case Study for Implementing an Enterprise Scale Data Fabric
 
Building a Data Analytics Center of Excellence - Digital Transformation
Building a Data Analytics Center of Excellence - Digital TransformationBuilding a Data Analytics Center of Excellence - Digital Transformation
Building a Data Analytics Center of Excellence - Digital Transformation
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
 
Achieving the Digital Thread through PLM and ALM Integration using OSLC
Achieving the Digital Thread through PLM and ALM Integration using OSLCAchieving the Digital Thread through PLM and ALM Integration using OSLC
Achieving the Digital Thread through PLM and ALM Integration using OSLC
 

Viewers also liked

Speed upyourtest with_appium
Speed upyourtest with_appiumSpeed upyourtest with_appium
Speed upyourtest with_appium
VodqaBLR
 
Data Analytics Project Plan
Data Analytics Project PlanData Analytics Project Plan
Data Analytics Project Plan
Jelilat Adesiyan
 
Gauge from an end user's perspective-fathima harris
Gauge from an end user's perspective-fathima harrisGauge from an end user's perspective-fathima harris
Gauge from an end user's perspective-fathima harris
VodqaBLR
 
Test automation_strategy_for_legacysystems
Test automation_strategy_for_legacysystemsTest automation_strategy_for_legacysystems
Test automation_strategy_for_legacysystems
VodqaBLR
 

Viewers also liked (20)

Divide and-conquer approach towards data analytics testing
Divide and-conquer approach towards data analytics testingDivide and-conquer approach towards data analytics testing
Divide and-conquer approach towards data analytics testing
 
First steps in testing analytics: Does test code quality matter?
First steps in testing analytics: Does test code quality matter?First steps in testing analytics: Does test code quality matter?
First steps in testing analytics: Does test code quality matter?
 
Speed upyourtest with_appium
Speed upyourtest with_appiumSpeed upyourtest with_appium
Speed upyourtest with_appium
 
Data Analytics Project Plan
Data Analytics Project PlanData Analytics Project Plan
Data Analytics Project Plan
 
Mobile Automation Using Appium - vodQA Bangalore 2015
Mobile Automation Using Appium - vodQA Bangalore 2015Mobile Automation Using Appium - vodQA Bangalore 2015
Mobile Automation Using Appium - vodQA Bangalore 2015
 
Gauge from an end user's perspective-fathima harris
Gauge from an end user's perspective-fathima harrisGauge from an end user's perspective-fathima harris
Gauge from an end user's perspective-fathima harris
 
Test automation_strategy_for_legacysystems
Test automation_strategy_for_legacysystemsTest automation_strategy_for_legacysystems
Test automation_strategy_for_legacysystems
 
Introduction to Gauge
Introduction to GaugeIntroduction to Gauge
Introduction to Gauge
 
Big data testing (1)
Big data testing (1)Big data testing (1)
Big data testing (1)
 
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
Data Driven Design - Web Analytics & Testing for Designers (Web Directions So...
 
How to perform Analytics testing on your website and tools
How to perform Analytics testing on your website and toolsHow to perform Analytics testing on your website and tools
How to perform Analytics testing on your website and tools
 
Big Data Testing
Big Data TestingBig Data Testing
Big Data Testing
 
Strategies for Distributed Agile Testing
Strategies for Distributed Agile TestingStrategies for Distributed Agile Testing
Strategies for Distributed Agile Testing
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 
Transforming Business Intelligence Testing
Transforming Business Intelligence TestingTransforming Business Intelligence Testing
Transforming Business Intelligence Testing
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest Group
 
Mind maps tutorial Agile Testing Days
Mind maps tutorial Agile Testing DaysMind maps tutorial Agile Testing Days
Mind maps tutorial Agile Testing Days
 
Repaso y ampliación 4º eso
Repaso y ampliación 4º esoRepaso y ampliación 4º eso
Repaso y ampliación 4º eso
 
Big Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeBig Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit Kharabe
 
Client-Side Performance Testing
Client-Side Performance TestingClient-Side Performance Testing
Client-Side Performance Testing
 

Similar to Data Analytics-testing spectrum

Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual Framework
Slava Kokaev
 
Data Mining for Developers
Data Mining for DevelopersData Mining for Developers
Data Mining for Developers
llangit
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
webuploader
 
SSAS R2 and SharePoint 2010 – Business Intelligence
SSAS R2 and SharePoint 2010 – Business IntelligenceSSAS R2 and SharePoint 2010 – Business Intelligence
SSAS R2 and SharePoint 2010 – Business Intelligence
Slava Kokaev
 
Big Data Analytics Webinar
Big Data Analytics WebinarBig Data Analytics Webinar
Big Data Analytics Webinar
Eckerson Group
 

Similar to Data Analytics-testing spectrum (20)

Bi Architecture And Conceptual Framework
Bi Architecture And Conceptual FrameworkBi Architecture And Conceptual Framework
Bi Architecture And Conceptual Framework
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
 
Business process modeling and analysis for data warehouse design
Business process modeling and analysis for data warehouse designBusiness process modeling and analysis for data warehouse design
Business process modeling and analysis for data warehouse design
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningRahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningRahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
 
Data Mining for Developers
Data Mining for DevelopersData Mining for Developers
Data Mining for Developers
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
 
SSAS R2 and SharePoint 2010 – Business Intelligence
SSAS R2 and SharePoint 2010 – Business IntelligenceSSAS R2 and SharePoint 2010 – Business Intelligence
SSAS R2 and SharePoint 2010 – Business Intelligence
 
Kaizentric Presentation
Kaizentric PresentationKaizentric Presentation
Kaizentric Presentation
 
Using the information server toolset to deliver end to end traceability
Using the information server toolset to deliver end to end traceabilityUsing the information server toolset to deliver end to end traceability
Using the information server toolset to deliver end to end traceability
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 
36x48_Trifold_FinalPoster
36x48_Trifold_FinalPoster36x48_Trifold_FinalPoster
36x48_Trifold_FinalPoster
 
MDL UGM April 2007
MDL UGM April 2007MDL UGM April 2007
MDL UGM April 2007
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Big Data Analytics Webinar
Big Data Analytics WebinarBig Data Analytics Webinar
Big Data Analytics Webinar
 
Oracle EPM BI Overview
Oracle EPM BI OverviewOracle EPM BI Overview
Oracle EPM BI Overview
 
[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx
[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx
[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx
 
Strategy For Data Quality
Strategy For Data QualityStrategy For Data Quality
Strategy For Data Quality
 
Marketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesMarketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success Rates
 
Capturing Value from Big Data through Data Driven Business models prensetation
Capturing Value from Big Data through Data Driven Business models prensetationCapturing Value from Big Data through Data Driven Business models prensetation
Capturing Value from Big Data through Data Driven Business models prensetation
 

More from VodqaBLR

More from VodqaBLR (20)

Consumer-Driven Contract Testing PACT
Consumer-Driven Contract Testing PACTConsumer-Driven Contract Testing PACT
Consumer-Driven Contract Testing PACT
 
Taiko presentation
Taiko presentationTaiko presentation
Taiko presentation
 
Chatbot Testing
Chatbot TestingChatbot Testing
Chatbot Testing
 
Key Note VodQA(Bangalore) 2018
Key Note VodQA(Bangalore) 2018Key Note VodQA(Bangalore) 2018
Key Note VodQA(Bangalore) 2018
 
Android security testing
Android security testingAndroid security testing
Android security testing
 
Advance appium workshop.pptx
Advance appium workshop.pptxAdvance appium workshop.pptx
Advance appium workshop.pptx
 
Blockchain workshop
Blockchain workshopBlockchain workshop
Blockchain workshop
 
Testing natural language processing
Testing natural language processingTesting natural language processing
Testing natural language processing
 
Drive chrome(headless) with puppeteer
Drive chrome(headless) with puppeteerDrive chrome(headless) with puppeteer
Drive chrome(headless) with puppeteer
 
Improve your Chaos IQ
Improve your Chaos IQImprove your Chaos IQ
Improve your Chaos IQ
 
WebDriver Lamda - Next Gen Scalable Test
WebDriver Lamda - Next Gen Scalable TestWebDriver Lamda - Next Gen Scalable Test
WebDriver Lamda - Next Gen Scalable Test
 
Testing Tools with AI
Testing Tools with AITesting Tools with AI
Testing Tools with AI
 
Dynamic Security Analysis & Static Security Analysis for Android Apps.
Dynamic Security Analysis & Static Security Analysis for Android Apps.Dynamic Security Analysis & Static Security Analysis for Android Apps.
Dynamic Security Analysis & Static Security Analysis for Android Apps.
 
Visual testing for Mobile Native Applications
Visual testing for Mobile Native ApplicationsVisual testing for Mobile Native Applications
Visual testing for Mobile Native Applications
 
Parallel Sim Test using XCUI
Parallel Sim Test using XCUI Parallel Sim Test using XCUI
Parallel Sim Test using XCUI
 
Performance Testing using Taurus
Performance Testing using TaurusPerformance Testing using Taurus
Performance Testing using Taurus
 
Writing Maintainable Tests
Writing Maintainable TestsWriting Maintainable Tests
Writing Maintainable Tests
 
Continuous security testing - sharing responsibility
Continuous security testing - sharing responsibilityContinuous security testing - sharing responsibility
Continuous security testing - sharing responsibility
 
ABCing docker with environments - workshop
ABCing docker with environments - workshopABCing docker with environments - workshop
ABCing docker with environments - workshop
 
Automate Web or Mobile Analytics using TrakMatic
Automate Web or Mobile Analytics using TrakMaticAutomate Web or Mobile Analytics using TrakMatic
Automate Web or Mobile Analytics using TrakMatic
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Data Analytics-testing spectrum

Editor's Notes

  1. Data Analytics : Process of collecting and examining the data with the goal of discovering useful information. Exploratory data analytics : log file analysis Driven by a specific problem statement : Market Basket Analysis Not always a decision making system, but sometimes a decision support system. Process : Collection: data gathered from various sources like online sources, survey data, satellites in raw format etc. Processing: Organize data in standard format Analysis: Build mathematical models fitting the existing data; Use these models to infer results for new data Visualization: Results communicated in the form of tables, graphs and charts
  2. Lets take some examples : Banks : analyze withdrawal and spending patterns to prevent fraud or identity theft E-commerce : companies examine the navigation patterns to determine the customers buying patterns based upon their previous purchases Energy : Industries are looking into how energy consumptions and operation costs could be optimized within a facility Yes, Data analysis is the lifeline of any business, No business can sustain without analyzing the available data. Data analytics is used in many industries to allow companies and organization to make better business decisions
  3. Testing plays a very crucial role in building a data analytics product. Lifeline of any progressive business Critical in making informed decisions for business planning Complexities of domain, computation, volume variety needs to be tackled with a planned testing approach
  4. Data Validation : Ensuring that the data is of right quality throughout the process Various stages of data flow : gathering, representing, cleansing and transforming etc Model Implementation : This is very crucial part and in depth domain knowledge is needed Validate if the model chosen is relevant for the respective domain Understanding the statistical model thoroughly with every parameters involved in computation Validating if the computations are implemented as required with right understanding Business perspective : Data is available, analysis is performed and some results are out. Now how to share it with the business ? Need to have a clear vision on business problem that we are trying to solve Its very important to have the business perspective here to ensure that the data represented serves the purpose What kind of charts/graphs are to be displayed, what level of data aggregations are required and is the UI intuitive ?
  5. Raw Data : Gather data in raw format ETL : Process and organize data: Extract data from multiple sources Transform into the required format. Load the data into database Modelling: Initial Analysis resulting in modeling which in turn results in model parameters Models implementation : Applying the statistical models or algorithms & computations Aggregation : Data analysis and computations happens at the granular level data needs to be aggregated at various hierarchies & different levels as per the business requirement Visualization : Communicate results of the analyzed data through visualization techniques Effective visual communication through tables, graphs and charts
  6. Format : Is the data provided in the required format - csv or excel format How many files or worksheet, what sort of data in each sheet , data types Text casing, data formats, number formats etc Consistency : * Data needs to be consistent across eg: there is a sales data in a particular city, but the city entry is not present in the reference data, a cheque is cleared , but no corresponding money transaction Completeness : Data is complete as expected : every data has mandatory and optional aspect. Like in a customer data name, phone & email are mandatory & address might be optional For example, In an retail data, an inventory table might show 5 units reduced, whereas the corresponding sales data might not reflect the sales of the same, so some data might be missing here.
  7. Post-ETL Validations : Meta Data: Ensuring the data model design is aligned with the real world domain Includes testing of data type check, data length check and index/constraint check Validating the data modelling : dimensions & facts Transformation : Validate whether the data values transformed are the expected data values. Validating the data transformation rules and source to target mapping Usually performed by validate counts, aggregates and actual data between the source and target Quality : Includes the data checks (text case, special characters, number checks/ precision, date format etc) Data constraints checks – ensuring the data transformation is according to the model like foreign key constraints, unique key constraints, null value etc To ensure all the expected data is loaded in the DB completely Business Specific : Business-side validations, domain specific, possible values Client agnostic as well as client-specific data checks
  8. Model Validation : Validating if the model chosen is relevant to the domain Performed by applying a model with past historic data Uses statistical metrics like R2 etc. Implementation : Understanding the logic behind the model/algorithms Getting the right values for the model parameters Computation : Validating the core analytics engine’s step wise computation
  9. Aggregation : Data should be aggregated at the required hierarchy level Relevant data as per scope has to be considered for aggregation Summarized values as per the computation for the above selected data should be validated
  10. UI Validations : Ensuring the correct data representation in the for of tables, charts and graphs Validating the format of representation – units, scale, alignment, unit conversion etc Usability testing aspect w.r.t the tables, graphs, chart : color combinations, filtering, UI interaction etc
  11. Initial Client data flow : setting predefined data template pre-validations before data handover Domain Knowledge : Domain intensive : KT sessions within team and validating the understanding with SME‘s Mimicking the simulation calculations in excel with a smaller dataset to thoroughly understanding Business involvement : Providing the test dataset closer to the real time data Prioritizing the test scenarios to get real user experience
  12. Implementation No easy way to come up with expected data, so decided on parallel implementation Business involvement in testing the model implementation Computation/performance Understanding the transformations, data explosions, data representation & the table joins Analyzing the factors involved in computation which influence the time/memory
  13. Test data : what subset of data would suffice to get the best data distribution, bridging gap between ideal & real world data coming up with edge case dataset Testing process : Testing data at every stage of data transformation Defect investigation with QA/Dev pairing Tools : Choice of tools to fit the purpose and intended for the users of the tool Spreadsheet gear, Excel macros, App manager Automation : DB structure varies per client, Generic (metadata SQLs) and Client specific tests, too many data combinations – so data driven framework Xml test data to segregate the data for various Clients Execution : Due to h/w, memory and time constraints, cautiously organize the test execution in CI Though automation was implemented at every stage, we cautiously decided on, to what extent automation coverage is required at each stage and accordingly decided the test execution frequency Divide & conquer QA/Dev pairing Data combination : system used by multiple users with differing background – varying metadata Test data in xml to support this 20% of possible dataset to cover 80% of the common use cases SME involvement in edge case Automation at every layer : cautious in deciding to what extent of automation Execution frequency : resource usage & computation time and SME availability Choice of tools