SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
ChakraView
A 360° approach to data quality
Shankar Manian
Keerthika Thiyagarajan
Background
● ~15 years in Big Data...
● ...as Data Janitors
● Can we do better ?
Data Quality - Missing Focus
● Afterthought
● Needle in a haystack
● Huge cost
Detection - Missing Dimensions
● Completeness
● Consistency
● Auditability
Cleansing - The Hidden Cost
● Trace the issue to source
● No SOP on how to fix
● Hard to Automate
Visibility - Or the lack of it
● Impact - Cost of bad data
● Breakdown and Prioritization
● Push quality upstream
State before
● Stakeholder driven
● Reactive process
● Business metrics
● Huge monetary impact
● Iterative Discovery
Validations Framework
● Granular Validations -> Business metrics
● Self serve onboarding
● Tigger on data refresh
● System health dashboard
TransactionI
d
OrderId Amount B.Amount InvoiceId L.Amount
TX1 OD1 100 100 I1 10
TX2 OD2 50 50 I2 50
TX3 OD3 75 75 I3 75
TX4 OD4 200 200
TX5 OD5 50 I5 50
Bad Records
PaymentGateway * BankStatement * Ledger
Amount Mismatch
Entry missing in Ledger
Entry missing in Bank statement
Salient features
● Abstract templates
○ Null check
○ Datatype compliance
○ Aggregated check
○ Range check
○ Cross comparison check
● Filter and transformation support
○ Exclude few records
○ Case-insensitive conversion
● Construct target dataframe
● Row level results
Validations UI
Sample Validation
{
"fact": [{
"fact_1": "payment_gateway",
"fact_2": "ledger",
"join_type":
"full_outer_join",
"join_columns": [{
"fact_1_column":
"transaction_id",
"fact_2_column":
"transaction_id",
"operator":
"equal"
}]
}],
"group_by_columns": ["transaction_id"],
"idempotency_columns": [
"transaction_id"
],
"validation_configurations": [{
"name": "amount_recon",
"operator": "equal",
"expression_list": [{
"expression": {
"operator": "amount",
"terminal": "pg_amount"
}
},
{
"expression": {
"operator": "l.amount",
"terminal": "ledger_amount"
}
}
]
}]
}
Data Flow
Trigger from
Azkaban
Run spark job Publish validation
failures
Fact refresh
Dashboard Datastore
Template Library
Validation
Configuration
Until now we were blissfully ignorant, Now we spend multiple man hours
categorising the bad records
TransactionId OrderId B.Amount InvoiceId L.Amount Category
TX1 OD1 100 I1 10
Amount wrong in
Ledger entry
TX5 OD4 200
Upstream Failure-
Payments
TX6 OD6 I6 50 File upload issue
Root Cause Analysis(RCA)
Bank Statement * Ledger
Combinatorial explosion
● The cycle is longer for big data due to
● Complexity of the system
● Time consuming
● Error prone
● Humanly impossible
● Real-time systems has ELK kind of tools
● No tools available for Big data to RCA
How do we make this operation cheap?
Auto-RCA
● Enrich logs and data from main pipeline
Enrichments
{
"commerce_activity": {
"activityType": "create_ledger",
"activityId": "TX12345",
"payload": "{"event":"create_ledger","entity_id":"TX12345"}",
"eventStatus": "ERRORED",
"retryCount": 0
},
"error_details": {
"activityType": "create_ledger",
"activityId": "TX12345",
"errorCode": "503",
"errorDescription": "Error: EnricherException{statusCode=503}",
"sourceSystem": "IRN",
"upstreamUriSignature": "/payment/<transaction>",
"upstreamUrl": "/payment/TX12345",
"upstreamHttpMethod": "GET",
"upstreamHeader": null,
"upstreamPayload": null,
"errorStatus": "OPEN",
"failureCount": null,
}
}
Auto-RCA
● Perform 5 Why RCA
● Hierarchical categorisation
● Leaf category -> Unique issues
Unclassified
Amount mismatch Missing entries
Missing entries in Bank
statement
Missing entries in ledger
Issue in invoice creation
Issue in Bank statement
Event processing failure
Event not arrived
Wrong value in file
File upload issue
Data not pushed to
analytical store
Unclassified
Fixture
● Can we automate cleaning the data?
Fixture
Event processing failure
Event not arrived
Wrong value in file
File upload issue
Data not pushed to
analytical store
reprocess_event
replay_event
reprocess_file republish_ledger_entry
Fixture
{
"flowName": "debtor_flow",
"categoryName": "Event processing failure",
"recipeName": "reprocess_event"
}
Fixture
● Recipes - Library of functions that automate the cleansing
● Leaf Category -> Recipe
● Sample Recipes
○ Reverse
○ Retry
○ Restore
Architecture
● Man-days reduced to few hours.
● Reactive to proactive
● Dev-friendly
● People independent
● Complete visibility
Next Steps
● Open source
● Data observability
● Performance optimisation
Questions?

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
Unlocking Geospatial Analytics Use Cases with CARTO and DatabricksUnlocking Geospatial Analytics Use Cases with CARTO and Databricks
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
 
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
 
Embedding Insight through Prediction Driven Logistics
Embedding Insight through Prediction Driven LogisticsEmbedding Insight through Prediction Driven Logistics
Embedding Insight through Prediction Driven Logistics
 
Building a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkBuilding a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache Spark
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
GoDaddy Customer Success Dashboard Using Apache Spark with Baburao Kamble
GoDaddy Customer Success Dashboard Using Apache Spark with Baburao KambleGoDaddy Customer Success Dashboard Using Apache Spark with Baburao Kamble
GoDaddy Customer Success Dashboard Using Apache Spark with Baburao Kamble
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Going Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph AnalyticsGoing Beyond Rows and Columns with Graph Analytics
Going Beyond Rows and Columns with Graph Analytics
 
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analytics
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph inside
 
How to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
 

Semelhante a ChakraView – A 360° Approach to Data Quality

Kafka For Financial Data Processing - The Flipkart Way (Shankar Manian and Ra...
Kafka For Financial Data Processing - The Flipkart Way (Shankar Manian and Ra...Kafka For Financial Data Processing - The Flipkart Way (Shankar Manian and Ra...
Kafka For Financial Data Processing - The Flipkart Way (Shankar Manian and Ra...
confluent
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
confluent
 
Overview of the financial architecture in oracle e business suite release 12
Overview of the  financial architecture in oracle e business suite release 12Overview of the  financial architecture in oracle e business suite release 12
Overview of the financial architecture in oracle e business suite release 12
magnificsairam
 

Semelhante a ChakraView – A 360° Approach to Data Quality (20)

Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Kafka For Financial Data Processing - The Flipkart Way (Shankar Manian and Ra...
Kafka For Financial Data Processing - The Flipkart Way (Shankar Manian and Ra...Kafka For Financial Data Processing - The Flipkart Way (Shankar Manian and Ra...
Kafka For Financial Data Processing - The Flipkart Way (Shankar Manian and Ra...
 
Overview of the financial architecture in oracle e business suite release 12
Overview of the  financial architecture in oracle e business suite release 12Overview of the  financial architecture in oracle e business suite release 12
Overview of the financial architecture in oracle e business suite release 12
 
Overview of the financial architecture in oracle e business suite release 12
Overview of the  financial architecture in oracle e business suite release 12Overview of the  financial architecture in oracle e business suite release 12
Overview of the financial architecture in oracle e business suite release 12
 
Overview of the financial architecture in oracle e business suite release 12
Overview of the  financial architecture in oracle e business suite release 12Overview of the  financial architecture in oracle e business suite release 12
Overview of the financial architecture in oracle e business suite release 12
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
 
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
 
NetSuite Reporting for High Transaction Volume & Self-Serve Businesses
NetSuite Reporting for High Transaction Volume & Self-Serve BusinessesNetSuite Reporting for High Transaction Volume & Self-Serve Businesses
NetSuite Reporting for High Transaction Volume & Self-Serve Businesses
 
Analysis, data & process modeling
Analysis, data & process modelingAnalysis, data & process modeling
Analysis, data & process modeling
 
How Intelligent Document Processing is Driving Accounts Receivable (AR) and A...
How Intelligent Document Processing is Driving Accounts Receivable (AR) and A...How Intelligent Document Processing is Driving Accounts Receivable (AR) and A...
How Intelligent Document Processing is Driving Accounts Receivable (AR) and A...
 
oracle Presntation.ppt
oracle Presntation.pptoracle Presntation.ppt
oracle Presntation.ppt
 
Danish Business Authority: Explainability and causality in relation to ML Ops
Danish Business Authority: Explainability and causality in relation to ML OpsDanish Business Authority: Explainability and causality in relation to ML Ops
Danish Business Authority: Explainability and causality in relation to ML Ops
 
Overview of the financial architecture in oracle e business suite release 12
Overview of the  financial architecture in oracle e business suite release 12Overview of the  financial architecture in oracle e business suite release 12
Overview of the financial architecture in oracle e business suite release 12
 
Overview of the financial architecture in oracle e business suite release 12
Overview of the  financial architecture in oracle e business suite release 12Overview of the  financial architecture in oracle e business suite release 12
Overview of the financial architecture in oracle e business suite release 12
 
Overview of the financial architecture in oracle e business suite release 12
Overview of the  financial architecture in oracle e business suite release 12Overview of the  financial architecture in oracle e business suite release 12
Overview of the financial architecture in oracle e business suite release 12
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t Work
 
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
 

Mais de Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Último (20)

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 

ChakraView – A 360° Approach to Data Quality

  • 1. ChakraView A 360° approach to data quality Shankar Manian Keerthika Thiyagarajan
  • 2. Background ● ~15 years in Big Data... ● ...as Data Janitors ● Can we do better ?
  • 3. Data Quality - Missing Focus ● Afterthought ● Needle in a haystack ● Huge cost
  • 4. Detection - Missing Dimensions ● Completeness ● Consistency ● Auditability
  • 5. Cleansing - The Hidden Cost ● Trace the issue to source ● No SOP on how to fix ● Hard to Automate
  • 6. Visibility - Or the lack of it ● Impact - Cost of bad data ● Breakdown and Prioritization ● Push quality upstream
  • 7. State before ● Stakeholder driven ● Reactive process ● Business metrics ● Huge monetary impact ● Iterative Discovery
  • 8. Validations Framework ● Granular Validations -> Business metrics ● Self serve onboarding ● Tigger on data refresh ● System health dashboard
  • 9. TransactionI d OrderId Amount B.Amount InvoiceId L.Amount TX1 OD1 100 100 I1 10 TX2 OD2 50 50 I2 50 TX3 OD3 75 75 I3 75 TX4 OD4 200 200 TX5 OD5 50 I5 50 Bad Records PaymentGateway * BankStatement * Ledger Amount Mismatch Entry missing in Ledger Entry missing in Bank statement
  • 10. Salient features ● Abstract templates ○ Null check ○ Datatype compliance ○ Aggregated check ○ Range check ○ Cross comparison check ● Filter and transformation support ○ Exclude few records ○ Case-insensitive conversion ● Construct target dataframe ● Row level results
  • 12. Sample Validation { "fact": [{ "fact_1": "payment_gateway", "fact_2": "ledger", "join_type": "full_outer_join", "join_columns": [{ "fact_1_column": "transaction_id", "fact_2_column": "transaction_id", "operator": "equal" }] }], "group_by_columns": ["transaction_id"], "idempotency_columns": [ "transaction_id" ], "validation_configurations": [{ "name": "amount_recon", "operator": "equal", "expression_list": [{ "expression": { "operator": "amount", "terminal": "pg_amount" } }, { "expression": { "operator": "l.amount", "terminal": "ledger_amount" } } ] }] }
  • 13. Data Flow Trigger from Azkaban Run spark job Publish validation failures Fact refresh Dashboard Datastore Template Library Validation Configuration
  • 14.
  • 15. Until now we were blissfully ignorant, Now we spend multiple man hours categorising the bad records
  • 16. TransactionId OrderId B.Amount InvoiceId L.Amount Category TX1 OD1 100 I1 10 Amount wrong in Ledger entry TX5 OD4 200 Upstream Failure- Payments TX6 OD6 I6 50 File upload issue Root Cause Analysis(RCA) Bank Statement * Ledger
  • 17.
  • 18. Combinatorial explosion ● The cycle is longer for big data due to ● Complexity of the system ● Time consuming ● Error prone ● Humanly impossible
  • 19. ● Real-time systems has ELK kind of tools ● No tools available for Big data to RCA How do we make this operation cheap?
  • 20. Auto-RCA ● Enrich logs and data from main pipeline
  • 21. Enrichments { "commerce_activity": { "activityType": "create_ledger", "activityId": "TX12345", "payload": "{"event":"create_ledger","entity_id":"TX12345"}", "eventStatus": "ERRORED", "retryCount": 0 }, "error_details": { "activityType": "create_ledger", "activityId": "TX12345", "errorCode": "503", "errorDescription": "Error: EnricherException{statusCode=503}", "sourceSystem": "IRN", "upstreamUriSignature": "/payment/<transaction>", "upstreamUrl": "/payment/TX12345", "upstreamHttpMethod": "GET", "upstreamHeader": null, "upstreamPayload": null, "errorStatus": "OPEN", "failureCount": null, } }
  • 22. Auto-RCA ● Perform 5 Why RCA ● Hierarchical categorisation ● Leaf category -> Unique issues
  • 23. Unclassified Amount mismatch Missing entries Missing entries in Bank statement Missing entries in ledger Issue in invoice creation Issue in Bank statement Event processing failure Event not arrived Wrong value in file File upload issue Data not pushed to analytical store Unclassified
  • 24.
  • 25. Fixture ● Can we automate cleaning the data?
  • 26. Fixture Event processing failure Event not arrived Wrong value in file File upload issue Data not pushed to analytical store reprocess_event replay_event reprocess_file republish_ledger_entry
  • 27. Fixture { "flowName": "debtor_flow", "categoryName": "Event processing failure", "recipeName": "reprocess_event" }
  • 28. Fixture ● Recipes - Library of functions that automate the cleansing ● Leaf Category -> Recipe ● Sample Recipes ○ Reverse ○ Retry ○ Restore
  • 30. ● Man-days reduced to few hours. ● Reactive to proactive ● Dev-friendly ● People independent ● Complete visibility
  • 31. Next Steps ● Open source ● Data observability ● Performance optimisation