SlideShare uma empresa Scribd logo
1 de 25
Pallavi Angraje, Sona Samad
Data Observability : The
Force Awakens
Oct 14, 2022
©2022 Intuit Inc. All rights reserved. 2
About Us
Intuit Data Ecosystem: Unique consumer and small business assets at scale
Data & Analytics at Intuit
Product Reports and Analytics Dashboards
Data Quality Challenges
Understanding data issues
Data Observability
Cure, Detect, Prevent, Eradicate : Data Observability Model in Intuit
Achieving Data Quality At Scale
Preventing Data Incidents Using DQ Checks, ADS & Infrastructure Monitoring
Agenda
©2022 Intuit Inc. All rights reserved. 3
Intuit Data Ecosystem: Unique consumer and
small business assets at scale
©2022 Intuit Inc. All rights reserved. 4
Intuit Product Report
©2022 Intuit Inc. All rights reserved. 5
The monthly aggregated metrics
for subscribers is not matching
with the weekly data.
Are there any issues?
-Lisa (Frustrated Data Worker)
©2022 Intuit Inc. All rights reserved. 6
Data Sources
Do not understand how
downstream uses the data
How the challenges originate?
Data Lake
Trouble identifying the ways the
data pipelines might break
Analytics
Hard to understand what’s wrong
in the data and how to get some
help
©2022 Intuit Inc. All rights reserved. 7
What is not well ?
Incorrect Reporting
More Incidents
High MTTD/
MTTR
Missed SLA
Data
Reload
Parity
Issues
©2022 Intuit Inc. All rights reserved. 8
Event
Parity
Issues
Source
Failures /
Delays
Data /
Processing
Defects
Prediction
Model
Defects
Incorrect
Reporting
SLA Misses
Data Analytics Platform
©2022 Intuit Inc. All rights reserved. 9
Achieving Data Quality
at Scale.
©2022 Intuit Inc. All rights reserved. 10
Let’s Recall the Problem
Incorrect Reporting
More Incidents
High MTTD/
MTTR
Missed SLA
Data
Reload
Parity
Issues
©2022 Intuit Inc. All rights reserved. 11
Cure Detect Prevent Eradicate
Monitoring Dashboard Alerting Protocol
©2022 Intuit Inc. All rights reserved. 12
Preventing Data Incidents: Data Quality Checks
Circuit
Breaker
Checks
Low
Priority
Alerts
Data
Validation
Checks
Key Design Decisions
● Multiple Source Support
● Performance Consistency & Scale
● Config Driven Data Profile Rules
● Capability to add Business Rules
● Run as part of Data Processing Pipelines
● Data Discrepancy and Anomaly Detection
● Fail Fast with Circuit Breaker
● Multi-channel Alerts, Single Window Reporting
Data
Pipeline 1
High
Priority
Alerts
Data
Pipeline 2
Data Quality Checks
©2022 Intuit Inc. All rights reserved. 13
Data
Quality
Library
Data
Source
Reports /
Alerts
Data
Quality
Rules
Data
Quality
Library
Data
Quality
Rules
Reports /
Alerts
©2022 Intuit Inc. All rights reserved. 14
DB file
.parquet
.csv
.xml
.json
Input
Sources
Object Store
Spark
SQL
Spark
Config
Input
Configs
Alerts
Triage
Dashboard
DB
logs Output
Object Store
Spark Process
Dataframe 3
Dataframe
Comparator
Spark Loader
Class
Spark Loader
Class
Dataframe 1 Dataframe 2
Spark Step 1 Spark Step 2
Spark Loader
Class
Dataframe n
Dataframe
Validator
Spark Step n
©2022 Intuit Inc. All rights reserved. 15
DB
logs Output
Object Store
DB file
.parquet
.csv
.xml
.json
Input
Sources
Object Store
Spark
SQL
Spark
Config
Input
Configs
Alerts
Triage
Dashboard
Spark Process
Dataframe 3
Dataframe
Comparator
Spark Loader
Class
Spark Loader
Class
Dataframe 1 Dataframe 2
Spark Step 1 Spark Step 2
Spark Loader
Class
Dataframe n
Dataframe
Validator
Spark Step n
Spark Loader
Class
Load
Transform
Save
dataframe
dataframe
PreStep = {
class-name = com.intuit.MySparkLoadClass,
inputs = {
my-company-gns-df = {
order = 2, sql = {
sql-type = local,
out-path = "s3://temp/path", table = companies_filtered,
sql = """select a.* from demo.company_info a
join demo.company_status b on a.c_id = b.c_id
where instr(a.company_name, 'delete') = 0"""
}, metadata = {is-input = true, is-save = true}
}
}
Custom Scala Class
Custom Spark SQL
©2022 Intuit Inc. All rights reserved. 16
DB file
.parquet
.csv
.xml
.json
Input
Sources
Object Store
Spark
SQL
Spark
Config
Input
Configs
Alerts
Triage
Dashboard
Spark Process
Spark Loader
Class
Spark Loader
Class
Spark Step 1 Spark Step 2
Spark Loader
Class
Dataframe n
Dataframe
Validator
Spark Step n
Spark Loader
Class
compare_1 = {
class-name = com.intuit.Dataframe
Comparator
,
properties = {
"comparator-config" = """{
"comparisonSets" :
[{"Product":"df_dataset_1"},
{"Billing
System":"df_dataset_2"}],
"validationName": "Test QBO
signup count by product ",
"comparisonType":
"percent_variance",
"threshold":"1.00",
"comp_out_df":
"df_gns_by_product"
}"""
}
Custom Scala Class
Comparison Sets
Validation Name
Threshold value
Output Dataframe
Comparison Type
config input
comp_out_df
Dataframe
Comparator
df_dataset_1 df_dataset_2
©2022 Intuit Inc. All rights reserved. 17
DB file
.parquet
.csv
.xml
.json
Input
Sources
Object Store
Spark
SQL
Spark
Config
Input
Configs
Alerts
Triage
Dashboard
Spark Process
Spark Loader
Class
Spark Loader
Class
Spark Step 1 Spark Step 2
Spark Loader
Class
Dataframe n
Dataframe
Validator
Spark Step n
Spark Loader
Class
comp_out_df
Dataframe
Comparator
df_dataset_1 df_dataset_2
DB
logs Output
Object Store
DatamartValidation = {
class-name = com.intuit.DataframeValidator,
properties = {
"validation-config" = """{
"pipelineName":"Test",
"validationResultSets" : ["my-company-gns-df",
"df_gns_by_product",..],
"resultGenericColumns":["validation_name:string",
"dimension_1:string","dimension_2:string",
"dataset_1:string","metric_1_value:decimal(20,4)",
"dataset_2:string","metric_2_value:decimal(20,4)",
"validation_result:string",
"is_valid:boolean"],
"outputDirectory":"s3://temp/outpath",
"emailId":"alertsmail@intuit.com",
"validatorOptions":{"failJob":"1", "emailSubject":"Test
Triangulation","forwardToSplunk":"1"}
}"""
}
}
Validation
Result Sets
Custom Scala Class
Result Set
Columns
Alert Email
©2022 Intuit Inc. All rights reserved. 18
Preventing Data Incidents: Anomaly Detection Checks
Features
● Ensemble of machine learning
algorithms
● Training done from historic
patterns
● Supports time series data
● Scheduled and API based
triggers for training and
inference
● Post inference anomalies
published for consumption
Time Series
Dataset
Majority Voting
Anomaly /
Not Anomaly
©2022 Intuit Inc. All rights reserved. 19
Data Anomaly Detection
©2022 Intuit Inc. All rights reserved. 20
Data Triage - Validating Data at Each Layer
©2022 Intuit Inc. All rights reserved. 21
Spark - Performance Optimization
Spark Lens
Ganglia Chart
Spark History Server
©2022 Intuit Inc. All rights reserved. 22
Preventing Data Incidents at 3 layers
©2022 Intuit Inc. All rights reserved. 23
Parity checks
to identify
event loss
Source Job
Failure / Delay
Alerts
Resource
Health
Monitoring Anomaly
Detection for
Data Outliers
Data Analytics Platform
Improved Forecast
Models
Data Quality
Checks &
Circuit
Breakers
©2022 Intuit Inc. All rights reserved. 24
Now I trust
the data !
Let’s ask Lisa
Q&A

Mais conteúdo relacionado

Mais procurados

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
DATAVERSITY
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 

Mais procurados (20)

Data Governance
Data GovernanceData Governance
Data Governance
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data Quality
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020Data lineage and observability with Marquez - subsurface 2020
Data lineage and observability with Marquez - subsurface 2020
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 

Semelhante a Data Observability.pptx

CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)
CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)
CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)
CISQ - Consortium for IT Software Quality
 

Semelhante a Data Observability.pptx (20)

Shield db data security
Shield db   data securityShield db   data security
Shield db data security
 
Shield db data security
Shield db   data securityShield db   data security
Shield db data security
 
Shield db data security
Shield db   data securityShield db   data security
Shield db data security
 
About The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe AnalyticsAbout The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe Analytics
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scale
 
2F9_S4HANA2020_BPD_EN_US.docx
2F9_S4HANA2020_BPD_EN_US.docx2F9_S4HANA2020_BPD_EN_US.docx
2F9_S4HANA2020_BPD_EN_US.docx
 
Smart Health Guide App
Smart Health Guide AppSmart Health Guide App
Smart Health Guide App
 
Data Quality Challenges & Solution Approaches in Yahoo!’s Massive Data
Data Quality Challenges & Solution Approaches in Yahoo!’s Massive DataData Quality Challenges & Solution Approaches in Yahoo!’s Massive Data
Data Quality Challenges & Solution Approaches in Yahoo!’s Massive Data
 
Leveraging Cross-Operational Test Data for Manufacturing Yield and DPPM/RMA I...
Leveraging Cross-Operational Test Data for Manufacturing Yield and DPPM/RMA I...Leveraging Cross-Operational Test Data for Manufacturing Yield and DPPM/RMA I...
Leveraging Cross-Operational Test Data for Manufacturing Yield and DPPM/RMA I...
 
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
 
An Overview of Data Lake
An Overview of Data LakeAn Overview of Data Lake
An Overview of Data Lake
 
Industrial IoT bootcamp
Industrial IoT bootcampIndustrial IoT bootcamp
Industrial IoT bootcamp
 
Fine grained root cause and impact analysis with CDAP Lineage
Fine grained root cause and impact analysis with CDAP LineageFine grained root cause and impact analysis with CDAP Lineage
Fine grained root cause and impact analysis with CDAP Lineage
 
Panduit Smartzone™ DCIM Solution Details
Panduit Smartzone™ DCIM Solution DetailsPanduit Smartzone™ DCIM Solution Details
Panduit Smartzone™ DCIM Solution Details
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)
CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)
CISQ and Software Quality Measurement - Software Assurance Forum (March 2010)
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Cisco Analytics: Accelerate Network Optimization with Virtualization
Cisco Analytics: Accelerate Network Optimization with VirtualizationCisco Analytics: Accelerate Network Optimization with Virtualization
Cisco Analytics: Accelerate Network Optimization with Virtualization
 
PRASAD
PRASAD PRASAD
PRASAD
 

Último

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 

Último (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 

Data Observability.pptx

  • 1. Pallavi Angraje, Sona Samad Data Observability : The Force Awakens Oct 14, 2022
  • 2. ©2022 Intuit Inc. All rights reserved. 2 About Us Intuit Data Ecosystem: Unique consumer and small business assets at scale Data & Analytics at Intuit Product Reports and Analytics Dashboards Data Quality Challenges Understanding data issues Data Observability Cure, Detect, Prevent, Eradicate : Data Observability Model in Intuit Achieving Data Quality At Scale Preventing Data Incidents Using DQ Checks, ADS & Infrastructure Monitoring Agenda
  • 3. ©2022 Intuit Inc. All rights reserved. 3 Intuit Data Ecosystem: Unique consumer and small business assets at scale
  • 4. ©2022 Intuit Inc. All rights reserved. 4 Intuit Product Report
  • 5. ©2022 Intuit Inc. All rights reserved. 5 The monthly aggregated metrics for subscribers is not matching with the weekly data. Are there any issues? -Lisa (Frustrated Data Worker)
  • 6. ©2022 Intuit Inc. All rights reserved. 6 Data Sources Do not understand how downstream uses the data How the challenges originate? Data Lake Trouble identifying the ways the data pipelines might break Analytics Hard to understand what’s wrong in the data and how to get some help
  • 7. ©2022 Intuit Inc. All rights reserved. 7 What is not well ? Incorrect Reporting More Incidents High MTTD/ MTTR Missed SLA Data Reload Parity Issues
  • 8. ©2022 Intuit Inc. All rights reserved. 8 Event Parity Issues Source Failures / Delays Data / Processing Defects Prediction Model Defects Incorrect Reporting SLA Misses Data Analytics Platform
  • 9. ©2022 Intuit Inc. All rights reserved. 9 Achieving Data Quality at Scale.
  • 10. ©2022 Intuit Inc. All rights reserved. 10 Let’s Recall the Problem Incorrect Reporting More Incidents High MTTD/ MTTR Missed SLA Data Reload Parity Issues
  • 11. ©2022 Intuit Inc. All rights reserved. 11 Cure Detect Prevent Eradicate Monitoring Dashboard Alerting Protocol
  • 12. ©2022 Intuit Inc. All rights reserved. 12 Preventing Data Incidents: Data Quality Checks Circuit Breaker Checks Low Priority Alerts Data Validation Checks Key Design Decisions ● Multiple Source Support ● Performance Consistency & Scale ● Config Driven Data Profile Rules ● Capability to add Business Rules ● Run as part of Data Processing Pipelines ● Data Discrepancy and Anomaly Detection ● Fail Fast with Circuit Breaker ● Multi-channel Alerts, Single Window Reporting Data Pipeline 1 High Priority Alerts Data Pipeline 2 Data Quality Checks
  • 13. ©2022 Intuit Inc. All rights reserved. 13 Data Quality Library Data Source Reports / Alerts Data Quality Rules Data Quality Library Data Quality Rules Reports / Alerts
  • 14. ©2022 Intuit Inc. All rights reserved. 14 DB file .parquet .csv .xml .json Input Sources Object Store Spark SQL Spark Config Input Configs Alerts Triage Dashboard DB logs Output Object Store Spark Process Dataframe 3 Dataframe Comparator Spark Loader Class Spark Loader Class Dataframe 1 Dataframe 2 Spark Step 1 Spark Step 2 Spark Loader Class Dataframe n Dataframe Validator Spark Step n
  • 15. ©2022 Intuit Inc. All rights reserved. 15 DB logs Output Object Store DB file .parquet .csv .xml .json Input Sources Object Store Spark SQL Spark Config Input Configs Alerts Triage Dashboard Spark Process Dataframe 3 Dataframe Comparator Spark Loader Class Spark Loader Class Dataframe 1 Dataframe 2 Spark Step 1 Spark Step 2 Spark Loader Class Dataframe n Dataframe Validator Spark Step n Spark Loader Class Load Transform Save dataframe dataframe PreStep = { class-name = com.intuit.MySparkLoadClass, inputs = { my-company-gns-df = { order = 2, sql = { sql-type = local, out-path = "s3://temp/path", table = companies_filtered, sql = """select a.* from demo.company_info a join demo.company_status b on a.c_id = b.c_id where instr(a.company_name, 'delete') = 0""" }, metadata = {is-input = true, is-save = true} } } Custom Scala Class Custom Spark SQL
  • 16. ©2022 Intuit Inc. All rights reserved. 16 DB file .parquet .csv .xml .json Input Sources Object Store Spark SQL Spark Config Input Configs Alerts Triage Dashboard Spark Process Spark Loader Class Spark Loader Class Spark Step 1 Spark Step 2 Spark Loader Class Dataframe n Dataframe Validator Spark Step n Spark Loader Class compare_1 = { class-name = com.intuit.Dataframe Comparator , properties = { "comparator-config" = """{ "comparisonSets" : [{"Product":"df_dataset_1"}, {"Billing System":"df_dataset_2"}], "validationName": "Test QBO signup count by product ", "comparisonType": "percent_variance", "threshold":"1.00", "comp_out_df": "df_gns_by_product" }""" } Custom Scala Class Comparison Sets Validation Name Threshold value Output Dataframe Comparison Type config input comp_out_df Dataframe Comparator df_dataset_1 df_dataset_2
  • 17. ©2022 Intuit Inc. All rights reserved. 17 DB file .parquet .csv .xml .json Input Sources Object Store Spark SQL Spark Config Input Configs Alerts Triage Dashboard Spark Process Spark Loader Class Spark Loader Class Spark Step 1 Spark Step 2 Spark Loader Class Dataframe n Dataframe Validator Spark Step n Spark Loader Class comp_out_df Dataframe Comparator df_dataset_1 df_dataset_2 DB logs Output Object Store DatamartValidation = { class-name = com.intuit.DataframeValidator, properties = { "validation-config" = """{ "pipelineName":"Test", "validationResultSets" : ["my-company-gns-df", "df_gns_by_product",..], "resultGenericColumns":["validation_name:string", "dimension_1:string","dimension_2:string", "dataset_1:string","metric_1_value:decimal(20,4)", "dataset_2:string","metric_2_value:decimal(20,4)", "validation_result:string", "is_valid:boolean"], "outputDirectory":"s3://temp/outpath", "emailId":"alertsmail@intuit.com", "validatorOptions":{"failJob":"1", "emailSubject":"Test Triangulation","forwardToSplunk":"1"} }""" } } Validation Result Sets Custom Scala Class Result Set Columns Alert Email
  • 18. ©2022 Intuit Inc. All rights reserved. 18 Preventing Data Incidents: Anomaly Detection Checks Features ● Ensemble of machine learning algorithms ● Training done from historic patterns ● Supports time series data ● Scheduled and API based triggers for training and inference ● Post inference anomalies published for consumption Time Series Dataset Majority Voting Anomaly / Not Anomaly
  • 19. ©2022 Intuit Inc. All rights reserved. 19 Data Anomaly Detection
  • 20. ©2022 Intuit Inc. All rights reserved. 20 Data Triage - Validating Data at Each Layer
  • 21. ©2022 Intuit Inc. All rights reserved. 21 Spark - Performance Optimization Spark Lens Ganglia Chart Spark History Server
  • 22. ©2022 Intuit Inc. All rights reserved. 22 Preventing Data Incidents at 3 layers
  • 23. ©2022 Intuit Inc. All rights reserved. 23 Parity checks to identify event loss Source Job Failure / Delay Alerts Resource Health Monitoring Anomaly Detection for Data Outliers Data Analytics Platform Improved Forecast Models Data Quality Checks & Circuit Breakers
  • 24. ©2022 Intuit Inc. All rights reserved. 24 Now I trust the data ! Let’s ask Lisa
  • 25. Q&A