SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
Infogix Confidential Copyright 2020Infogix Confidential Copyright 2020
Why You Need End-to-End Data Quality
to Build Trust in Kafka
Infogix Confidential Copyright 2020
Webinar Speakers
Jeff Brown
Infogix, Inc., Director, Data Quality and Analytics
Jeff Brown has been with Infogix for more than 8 years. He is currently a Director in Product
Management responsible for delivering customer driven solutions across the Infogix
Data3Sixty Platform. He has a Bachelor of Science in Engineering from Michigan State
University and an MBA from DePaul University.
Infogix Confidential Copyright 2020
“90% of the world’s data was created within
the last two years”
~ Forbes
Infogix Confidential Copyright 2020
Kafka at a Glance…
Kafka will be a
mission-critical
part of
organization in
2018**
Kafka
deployments
replacing
existing
technology**
90% 62%
Fortune 100
Companies using
Apache Kafka*
Organizations
Globally using
Apache Kafka
Worldwide*
60% +100K
*Kafka Summit San Francisco 2019; Jun Rao: Confluent **2018 Apache Kafka Report; Confluent
Survey of 600 Users from 59 Countries
Infogix Confidential Copyright 2020Infogix Confidential Copyright 2019
Why are organizations
moving to a
streaming-based
architecture?
Infogix Confidential Copyright 2020
What is Apache Kafka?
*Source: https://kafka.apache.org/intro
Kafka is an open source real-time streaming message system and
protocol built around a publish-subscribe system*
Producers publish data to feeds (topics) that consumers subscribe to
and receive messages from
Messages are stored in topics across Kafka in many partitions which
supports redundancy
Infogix Confidential Copyright 2020
Kafka Data Pipeline Flow
Producers Consumers
Kafka
Platform
Data Lake
Database
Application
Applications /
3rd Party
Vendors
Files
Logs / IOT
Publish Subscribe
Messages
Topic
Topic
Messages
Infogix Confidential Copyright 2020
Advantages of Apache Kafka
Fault Tolerance
ensures that data is available even if failures
occur within the cluster
Real-Time Data Availability enables
reduced lag time in critical data driven
decisions
Centralized Access to Data provides a
consolidated data hub approach to reduce
complexity
Data Storage Layer
acts as intermediary storage enabling
consumption when needed
Scalability of Data Handling supports
high volume data handling and data delivery
Reduced Integration Points
lower complexity of data system
communication
Infogix Confidential Copyright 2020
Key Drivers to Move to Kafka
Create a unified
data hub for the
business to
consume data
Give better data
access to data
scientists and
analytics teams
Support data
communication
for digital
transformation
strategy
Ability to make
faster business
decisions on
more real-time
data
Infogix Confidential Copyright 2020Infogix Confidential Copyright 2019
Common challenges
confronting
organizations as they
adopt Kafka
Infogix Confidential Copyright 2020
What are organizations saying?
“We are moving all system-to-
system communication from file
based to Kafka messages”
• New means of digital
communication
• Recognize need for real-
time data access
“We don’t trust the stability of our
Kafka platform to expand
its usage”
• Lack of trust in their Kafka
platform
• Require insights into
operations
“Audit will not let us move forward
with our Kafka platform without
being able to validate the data”
• Need validation on data in
motion
• Auditability of data and
process is still a key focus
Infogix Confidential Copyright 2020
New Technologies, Same Challenges
How do we know if all data
arrived in the correct order?
Do we know if all
transactions that were
supposed to be sent were
sent?
Do we know if duplicate data
transactions have been sent?
Do we know if all transactions
that were supposed to be sent
arrived?
What action should be taken
on errors or potential lost
data?
Do we know if all transactions
were sent and arrived in a
timely manner?
Organizations
Should be
Asking…
Do we know if all transactions
were aggregated and
transformed correctly?
Infogix Confidential Copyright 2020
• Unable to monitor data volumes
for anomalies
• Inaccurate prediction of data
volume needed for retention
• Unable to identify underlying
infrastructure issues
What is the Impact of the Challenges?
• Incorrect data being consumed
to make business decisions
• Potential customer loss, harm to
reputation, revenue loss,
regulatory fines
• Reduced overall trust
IT / Operations Business
Infogix Confidential Copyright 2020Infogix Confidential Copyright 2019
How do you build data
trust within your
organization?
Infogix Confidential Copyright 2020
Source Processing Finished Good
Focus on Data Pipeline from End-to-End
- Raw Data
- Source Data
- Third Party
- Semi-Processed Data
- Non-Aggregated
- Data Warehouse
- Data Mart
- MDM
Infogix Confidential Copyright 2020
Infogix Enables That Level of Independent Trust
Producers Consumers
Data Lake / Warehouse
Database
Application
Applications /
3rd Party
Vendors
Files
Logs / IOT
Kafka
Cluster
Infogix Confidential Copyright 2020
Producer to Consumer
Data Quality
Reconciliation
Balancing
Integrity
Trust is Built on a Multifaceted Approach
Infogix Confidential Copyright 2020
Producer Consumer
VISUALIZE
REMEDIATE
MONITOR
VALIDATE
BOTTOM
TOP
Provide a 360o Standard to Data Trust
Infogix Confidential Copyright 2020
Data360® Platform
Infogix Confidential Copyright 2020
Data Quality for Streaming Data
Real-Time and Batch
Validation
Validate streaming data in real-
time or in batch to meet
required time windows
Balancing & Reconciliation
Reconcile data from source to
target to ensure all messages
arrived and values are
balanced
Visualize
Generate dashboards and track
streaming data over time to
highlight operational results
Identify and Manage
Exceptions
Identify, route and remediate
streaming data exceptions
Transformation &
Aggregation
Capture, transform and
aggregate data for both
streaming and non-
streaming data
Statistical Control
Monitor streams and apply
statistical controls like
thresholds violations or std.
deviations
Machine Learning
Utilize ML to identify patterns
and outliers within data
streams for better insights
Enrich Streaming Data
Enrich or join streaming data,
then generate new streams or
other output types
Infogix Confidential Copyright 2020
Data360 Streaming
• Streaming Data Store Input:
Bring in data from a streaming
data source
• Streaming Data Store Output:
Output streaming data to a data
source
• Convert to Micro Batch:
Convert streaming data to batch
data
• Streaming SQL
Stream input using SQL
statements
• Streaming Join
Joins two streaming sources or a
streaming and batch source
• Streaming Deduplication
Eliminates redundant streaming
data
Streaming Functionality
Infogix Confidential Copyright 2020
Handling Messages In-Flight
C u s t o m e r U s e C a s e s
• Validate Streaming Data Inline
◦ Read messages from a Kafka Topic
◦ Apply data quality/custom rules to the message
◦ Determine if the message data passes/fails the rules
◦ Route the message to a corresponding Topic (valid/invalid)
Infogix Confidential Copyright 2020
Handling Messages via Micro batches
C u s t o m e r U s e C a s e s
• Validate & Process Micro batched Streaming Data
◦ Read small batches (micro batch) of messages from a Kafka Topic
◦ Apply DQ/custom rules or complex processing to the micro batch
◦ Route the micro batch to other downstream processes OR convert
micro batch to messages and post to a Kafka Topic
Infogix Confidential Copyright 2020
Handling Messages Streaming & Non-Streaming
C u s t o m e r U s e C a s e s
• Streaming & Non-Streaming Data
◦ Validate both streaming and
non-streaming sources
◦ Join non-streaming data with
streaming data messages
◦ Output to Kafka or non-
streaming data types
Infogix Confidential Copyright 2020
Built-in Quality and Exception Tracking
Route, Workflow, Resolve Issues
Infogix Provides Single Solution to Build Data
+100 Pre-Defined Data Quality Rules
Infogix Confidential Copyright 2020
• International bank with +$100B assets is working with us
for reconciliation on streaming architecture
• A large financial institution is working with us to deliver
Kafka capabilities as part of their data integrity and data
quality controls group
• One of the largest health insurers in the world – and a
30 year Infogix customer – has initiated discussions with
us around our Kafka solutions
What are we Hearing from Customers?
Infogix Confidential Copyright 2020
Key Takeaways
• Organizations are sprinting towards adopting Kafka, but will be
faced with the same operational data quality issues as before
• Faster data delivery and higher data volumes will lead to
increased data quality issues if not managed properly
• The entire data pipeline from end-to-end must be validated
and monitored to ensure trust and optimize streaming data
investments
Infogix Confidential Copyright 2020Infogix Confidential Copyright 2019
Find out more:
• Infographic
• eBook
• Data Sheet
• Blogs
• Visit our resource center to learn more about Infogix and Kafka
www.infogix.com
Jeff Brown
Infogix, Inc., Director, Data Quality and Analytics
Email: jbrown@infogix.com
Phone: 1.630.505.5566
Infogix Confidential Copyright 2020Infogix Confidential Copyright 2019
Questions?
Please submit your questions via the web in the Q&A panel in the
lower right hand corner of your screen.
If we do not get to your question we will personally follow up with you following the event.

Mais conteúdo relacionado

Mais procurados

Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
DataWorks Summit
 

Mais procurados (20)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Patterns of resilience
Patterns of resiliencePatterns of resilience
Patterns of resilience
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
IBM Integration Bus & WebSphere MQ - High Availability & Disaster Recovery
IBM Integration Bus & WebSphere MQ - High Availability & Disaster RecoveryIBM Integration Bus & WebSphere MQ - High Availability & Disaster Recovery
IBM Integration Bus & WebSphere MQ - High Availability & Disaster Recovery
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Difference
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Orchestration Patterns for Microservices with Messaging by RabbitMQ
Orchestration Patterns for Microservices with Messaging by RabbitMQOrchestration Patterns for Microservices with Messaging by RabbitMQ
Orchestration Patterns for Microservices with Messaging by RabbitMQ
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
 
Real time data quality on Flink
Real time data quality on FlinkReal time data quality on Flink
Real time data quality on Flink
 

Semelhante a Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka

DevOps is to Infrastructure as Code, as DataOps is to...?
DevOps is to Infrastructure as Code, as DataOps is to...?DevOps is to Infrastructure as Code, as DataOps is to...?
DevOps is to Infrastructure as Code, as DataOps is to...?
Data Con LA
 
Liberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and DatabricksLiberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and Databricks
Precisely
 

Semelhante a Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka (20)

Cisco Connect Toronto 2018 an introduction to Cisco kinetic
Cisco Connect Toronto 2018   an introduction to Cisco kineticCisco Connect Toronto 2018   an introduction to Cisco kinetic
Cisco Connect Toronto 2018 an introduction to Cisco kinetic
 
Cisco Connect Toronto 2018 an introduction to Cisco kinetic
Cisco Connect Toronto 2018   an introduction to Cisco kineticCisco Connect Toronto 2018   an introduction to Cisco kinetic
Cisco Connect Toronto 2018 an introduction to Cisco kinetic
 
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
Transform Your Mainframe and IBM i Data for the Cloud with Precisely and Apac...
 
30 March 2017 - Vuzion Ireland Love Cloud
30 March 2017 - Vuzion Ireland Love Cloud30 March 2017 - Vuzion Ireland Love Cloud
30 March 2017 - Vuzion Ireland Love Cloud
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
DevOps is to Infrastructure as Code, as DataOps is to...?
DevOps is to Infrastructure as Code, as DataOps is to...?DevOps is to Infrastructure as Code, as DataOps is to...?
DevOps is to Infrastructure as Code, as DataOps is to...?
 
Democratized Data & Analytics for the Cloud​
Democratized Data & Analytics for the Cloud​Democratized Data & Analytics for the Cloud​
Democratized Data & Analytics for the Cloud​
 
Turning Big Data into Better Business Outcomes
Turning Big Data into Better Business OutcomesTurning Big Data into Better Business Outcomes
Turning Big Data into Better Business Outcomes
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in Logistics
 
CL2015 - Datacenter and Cloud Strategy and Planning
CL2015 - Datacenter and Cloud Strategy and PlanningCL2015 - Datacenter and Cloud Strategy and Planning
CL2015 - Datacenter and Cloud Strategy and Planning
 
Journey to the Cloud with Precisely
Journey to the Cloud with Precisely Journey to the Cloud with Precisely
Journey to the Cloud with Precisely
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Hybrid Integration
Hybrid IntegrationHybrid Integration
Hybrid Integration
 
IICS_Capabilities.pptx
IICS_Capabilities.pptxIICS_Capabilities.pptx
IICS_Capabilities.pptx
 
20181212 AWS NL - Informatica Cloud Overview
20181212 AWS NL - Informatica Cloud Overview20181212 AWS NL - Informatica Cloud Overview
20181212 AWS NL - Informatica Cloud Overview
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
 
Liberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and DatabricksLiberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and Databricks
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
 
Foundational Strategies for Trust in Big Data Part 3: Data Lineage
Foundational Strategies for Trust in Big Data Part 3: Data LineageFoundational Strategies for Trust in Big Data Part 3: Data Lineage
Foundational Strategies for Trust in Big Data Part 3: Data Lineage
 

Mais de DATAVERSITY

The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 

Mais de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Último

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 

Último (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 

Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka

  • 1. Infogix Confidential Copyright 2020Infogix Confidential Copyright 2020 Why You Need End-to-End Data Quality to Build Trust in Kafka
  • 2. Infogix Confidential Copyright 2020 Webinar Speakers Jeff Brown Infogix, Inc., Director, Data Quality and Analytics Jeff Brown has been with Infogix for more than 8 years. He is currently a Director in Product Management responsible for delivering customer driven solutions across the Infogix Data3Sixty Platform. He has a Bachelor of Science in Engineering from Michigan State University and an MBA from DePaul University.
  • 3. Infogix Confidential Copyright 2020 “90% of the world’s data was created within the last two years” ~ Forbes
  • 4. Infogix Confidential Copyright 2020 Kafka at a Glance… Kafka will be a mission-critical part of organization in 2018** Kafka deployments replacing existing technology** 90% 62% Fortune 100 Companies using Apache Kafka* Organizations Globally using Apache Kafka Worldwide* 60% +100K *Kafka Summit San Francisco 2019; Jun Rao: Confluent **2018 Apache Kafka Report; Confluent Survey of 600 Users from 59 Countries
  • 5. Infogix Confidential Copyright 2020Infogix Confidential Copyright 2019 Why are organizations moving to a streaming-based architecture?
  • 6. Infogix Confidential Copyright 2020 What is Apache Kafka? *Source: https://kafka.apache.org/intro Kafka is an open source real-time streaming message system and protocol built around a publish-subscribe system* Producers publish data to feeds (topics) that consumers subscribe to and receive messages from Messages are stored in topics across Kafka in many partitions which supports redundancy
  • 7. Infogix Confidential Copyright 2020 Kafka Data Pipeline Flow Producers Consumers Kafka Platform Data Lake Database Application Applications / 3rd Party Vendors Files Logs / IOT Publish Subscribe Messages Topic Topic Messages
  • 8. Infogix Confidential Copyright 2020 Advantages of Apache Kafka Fault Tolerance ensures that data is available even if failures occur within the cluster Real-Time Data Availability enables reduced lag time in critical data driven decisions Centralized Access to Data provides a consolidated data hub approach to reduce complexity Data Storage Layer acts as intermediary storage enabling consumption when needed Scalability of Data Handling supports high volume data handling and data delivery Reduced Integration Points lower complexity of data system communication
  • 9. Infogix Confidential Copyright 2020 Key Drivers to Move to Kafka Create a unified data hub for the business to consume data Give better data access to data scientists and analytics teams Support data communication for digital transformation strategy Ability to make faster business decisions on more real-time data
  • 10. Infogix Confidential Copyright 2020Infogix Confidential Copyright 2019 Common challenges confronting organizations as they adopt Kafka
  • 11. Infogix Confidential Copyright 2020 What are organizations saying? “We are moving all system-to- system communication from file based to Kafka messages” • New means of digital communication • Recognize need for real- time data access “We don’t trust the stability of our Kafka platform to expand its usage” • Lack of trust in their Kafka platform • Require insights into operations “Audit will not let us move forward with our Kafka platform without being able to validate the data” • Need validation on data in motion • Auditability of data and process is still a key focus
  • 12. Infogix Confidential Copyright 2020 New Technologies, Same Challenges How do we know if all data arrived in the correct order? Do we know if all transactions that were supposed to be sent were sent? Do we know if duplicate data transactions have been sent? Do we know if all transactions that were supposed to be sent arrived? What action should be taken on errors or potential lost data? Do we know if all transactions were sent and arrived in a timely manner? Organizations Should be Asking… Do we know if all transactions were aggregated and transformed correctly?
  • 13. Infogix Confidential Copyright 2020 • Unable to monitor data volumes for anomalies • Inaccurate prediction of data volume needed for retention • Unable to identify underlying infrastructure issues What is the Impact of the Challenges? • Incorrect data being consumed to make business decisions • Potential customer loss, harm to reputation, revenue loss, regulatory fines • Reduced overall trust IT / Operations Business
  • 14. Infogix Confidential Copyright 2020Infogix Confidential Copyright 2019 How do you build data trust within your organization?
  • 15. Infogix Confidential Copyright 2020 Source Processing Finished Good Focus on Data Pipeline from End-to-End - Raw Data - Source Data - Third Party - Semi-Processed Data - Non-Aggregated - Data Warehouse - Data Mart - MDM
  • 16. Infogix Confidential Copyright 2020 Infogix Enables That Level of Independent Trust Producers Consumers Data Lake / Warehouse Database Application Applications / 3rd Party Vendors Files Logs / IOT Kafka Cluster
  • 17. Infogix Confidential Copyright 2020 Producer to Consumer Data Quality Reconciliation Balancing Integrity Trust is Built on a Multifaceted Approach
  • 18. Infogix Confidential Copyright 2020 Producer Consumer VISUALIZE REMEDIATE MONITOR VALIDATE BOTTOM TOP Provide a 360o Standard to Data Trust
  • 19. Infogix Confidential Copyright 2020 Data360® Platform
  • 20. Infogix Confidential Copyright 2020 Data Quality for Streaming Data Real-Time and Batch Validation Validate streaming data in real- time or in batch to meet required time windows Balancing & Reconciliation Reconcile data from source to target to ensure all messages arrived and values are balanced Visualize Generate dashboards and track streaming data over time to highlight operational results Identify and Manage Exceptions Identify, route and remediate streaming data exceptions Transformation & Aggregation Capture, transform and aggregate data for both streaming and non- streaming data Statistical Control Monitor streams and apply statistical controls like thresholds violations or std. deviations Machine Learning Utilize ML to identify patterns and outliers within data streams for better insights Enrich Streaming Data Enrich or join streaming data, then generate new streams or other output types
  • 21. Infogix Confidential Copyright 2020 Data360 Streaming • Streaming Data Store Input: Bring in data from a streaming data source • Streaming Data Store Output: Output streaming data to a data source • Convert to Micro Batch: Convert streaming data to batch data • Streaming SQL Stream input using SQL statements • Streaming Join Joins two streaming sources or a streaming and batch source • Streaming Deduplication Eliminates redundant streaming data Streaming Functionality
  • 22. Infogix Confidential Copyright 2020 Handling Messages In-Flight C u s t o m e r U s e C a s e s • Validate Streaming Data Inline ◦ Read messages from a Kafka Topic ◦ Apply data quality/custom rules to the message ◦ Determine if the message data passes/fails the rules ◦ Route the message to a corresponding Topic (valid/invalid)
  • 23. Infogix Confidential Copyright 2020 Handling Messages via Micro batches C u s t o m e r U s e C a s e s • Validate & Process Micro batched Streaming Data ◦ Read small batches (micro batch) of messages from a Kafka Topic ◦ Apply DQ/custom rules or complex processing to the micro batch ◦ Route the micro batch to other downstream processes OR convert micro batch to messages and post to a Kafka Topic
  • 24. Infogix Confidential Copyright 2020 Handling Messages Streaming & Non-Streaming C u s t o m e r U s e C a s e s • Streaming & Non-Streaming Data ◦ Validate both streaming and non-streaming sources ◦ Join non-streaming data with streaming data messages ◦ Output to Kafka or non- streaming data types
  • 25. Infogix Confidential Copyright 2020 Built-in Quality and Exception Tracking Route, Workflow, Resolve Issues Infogix Provides Single Solution to Build Data +100 Pre-Defined Data Quality Rules
  • 26. Infogix Confidential Copyright 2020 • International bank with +$100B assets is working with us for reconciliation on streaming architecture • A large financial institution is working with us to deliver Kafka capabilities as part of their data integrity and data quality controls group • One of the largest health insurers in the world – and a 30 year Infogix customer – has initiated discussions with us around our Kafka solutions What are we Hearing from Customers?
  • 27. Infogix Confidential Copyright 2020 Key Takeaways • Organizations are sprinting towards adopting Kafka, but will be faced with the same operational data quality issues as before • Faster data delivery and higher data volumes will lead to increased data quality issues if not managed properly • The entire data pipeline from end-to-end must be validated and monitored to ensure trust and optimize streaming data investments
  • 28. Infogix Confidential Copyright 2020Infogix Confidential Copyright 2019 Find out more: • Infographic • eBook • Data Sheet • Blogs • Visit our resource center to learn more about Infogix and Kafka www.infogix.com Jeff Brown Infogix, Inc., Director, Data Quality and Analytics Email: jbrown@infogix.com Phone: 1.630.505.5566
  • 29. Infogix Confidential Copyright 2020Infogix Confidential Copyright 2019 Questions? Please submit your questions via the web in the Q&A panel in the lower right hand corner of your screen. If we do not get to your question we will personally follow up with you following the event.