SlideShare uma empresa Scribd logo
1 de 26
1
Simplify Governance of
Streaming Data
Gwen Shapira
Product Manager, Apache Kafka™ PMC
2
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent
Control Center
Date: Thursday, March 16, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Date: Thursday, March 23, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Clarke Patterson, Senior Director, Product
Marketing
3
What is Data Governance?
Data Access
Master Data
Management
Data Agility
Data Reliability
Data Quality
4
To Grow a Successful Data Platform, You need Some Guarantees
5
If you can’t trust your data,
what can you trust?
7
Schemas Are a Key Part of This
1. Schemas: Enforced Contracts Between your Components
2. Schemas need to be agile but compatible
3. You need tools & process that will let you:
• Move fast and not break things
9
Cautionary tale: Uber
Producers
Service 1 Service 2
RedshiftS3
EMR
{created_at: “Dec 1,
2015”}
CREATE (
created_at: STRING )
parseDate(created_at, “%M %d,
%yyyy”)
10
Producers
Service 1 Service 2
RedshiftS3
EMR
{created_at:
1491368429}
CREATE (
created_at: STRING )
parseDate(created_at, “%M %d,
%yyyy”)
Cautionary tale: Uber
11
The right way to do things
MessageCustomer
RoomGifts
BookRoom
LoyalCustomers
12
The right way to do things
MessageCustomer
RoomGifts
BookRoom
LoyalCustomers
BeachPromo If (property.location == beach &&
customer.level == platinum)
sendMessage({topic: RoomGifts, customer_id: customer.id,
room: property.roomnum, gift: seaShell})
13
There is special value in standardizing parts of schema
{ "type": "record",
"name": "Event",
"fields": [{ "name": "headers", "type": {
“name” : “headers”,
“type”: ”record”
“fields”: [
{“name”: “source_app”, “type”: “string”},
{“name”: “host_app”, “type”: “string”},
{“name”: “destination_app”, “type”: [“null,“string”]},
{“name”: “timestamp”, “type”: “long”}
]},
{ "name": "body", "type": "bytes" }] }
Standard headers allowed
analyzing patterns of communication
S3
15
In Request Response World – APIs between services are key
In Async Stream Processing World – Message Schemas ARE the
API
…except they stick around for a lot longer
16
Lets assume you see why you need
schema compatibility…
How do we make it work for us?
17
Schema Registry
Elastic
Cassandra
Streams
App
Example
Consumers
Serializer
App 1
Serializer
App 2
!
Kafka Topic!
Schema
Registry
Define the expected fields for each Kafka topic
Automatically handle schema changes (e.g. new fields)
Prevent incompatible changes
Supports multi-datacenter environments
18
Key Points:
• Schema Registry is efficient – due to caching
• Schema Registry is transparent
• Schema Registry is Highly Available
• Schema Registry prevents bad data at the source
19
Big Benefits
• Schema Management is part of your entire app lifecycle – from dev
to prod
• Single source of truth for all apps and data stores
• No more “Bad Data” randomly breaking things
• Increased agility! add, remove and modify fields with confidence
• Teams can share and discover data
• Smaller messages on the wire and in storage
20
But the Best Part is…
RDBM
Hadoop /
Hive
Kafka w/
Schema
Registry
JDBC
Source
HDFS
Sink
Schema
Schema
Schema
Schema
Schema
21
Everyone Schema
Registry
22
Compatibility tips!
• Do you want to upgrade producer without touching consumers?
• You want *forward* compatibility.
• You can add fields.
• You can delete fields with defaults
• Do you want to update schema in a DWH but still read old messages?
• You want *backward* compatibility
• You can delete fields
• You can add fields with defaults
• Both?
• You want *full* compatibility.
• Default all things!
• Except the primary key which you never touch.
23
App Lifecycle
Developement
Source
Repository
Pull Request
CI Staging / Test
Production
“Nightly build”
Lets Try
Lets Do It!
Optional
Good
Idea
YES Almost
too late
Last
resort
24
Finding Schemas
1. Write Schemas -> Generate Code
2. Get Schemas from registry -> Generate code
3. Write code -> Generate Schemas
25
Summary!
1. Data Quality is critical
2. Schema are critical
3. Schema Registry is used to maintain quality
from Dev to Prod
4. Schema Registry is in Confluent Open Source
26
Attend the whole series!
Simplify Governance for Streaming Data in Apache Kafka
Date: Thursday, April 6, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Gwen Shapira, Product Manager, Confluent
Using Apache Kafka to Analyze Session Windows
Date: Thursday, March 30, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Michael Noll, Product Manager, Confluent
Monitoring and Alerting Apache Kafka with Confluent
Control Center
Date: Thursday, March 16, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Nick Dearden, Director, Engineering and Product
Data Pipelines Made Simple with Apache Kafka
Date: Thursday, March 23, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Ewen Cheslack-Postava, Engineer, Confluent
https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/
What’s New in Apache Kafka 0.10.2 and Confluent 3.2
Date: Thursday, March 9, 2017
Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET
Speaker: Clarke Patterson, Senior Director, Product
Marketing
27
Get Started with Apache Kafka Today!
https://www.confluent.io/downloads/
THE place to start with Apache Kafka!
Thoroughly tested and
quality assured
More extensible developer
experience
Easy upgrade path to
Confluent Enterprise
28
30% off conference fee
code = kafpcf17
New York City | May 8, 2017
Presented by
29
Thank You!

Mais conteúdo relacionado

Mais de Gwen (Chen) Shapira

Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Gwen (Chen) Shapira
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings MeetupGwen (Chen) Shapira
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersGwen (Chen) Shapira
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupGwen (Chen) Shapira
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureGwen (Chen) Shapira
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
 
Scaling etl with hadoop shapira 3
Scaling etl with hadoop   shapira 3Scaling etl with hadoop   shapira 3
Scaling etl with hadoop shapira 3Gwen (Chen) Shapira
 

Mais de Gwen (Chen) Shapira (20)

Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
Scaling etl with hadoop shapira 3
Scaling etl with hadoop   shapira 3Scaling etl with hadoop   shapira 3
Scaling etl with hadoop shapira 3
 
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
 
Ssd collab13
Ssd   collab13Ssd   collab13
Ssd collab13
 

Último

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 

Último (20)

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 

Data Governance and Schema Registry for Software Developers

  • 1. 1 Simplify Governance of Streaming Data Gwen Shapira Product Manager, Apache Kafka™ PMC
  • 2. 2 Attend the whole series! Simplify Governance for Streaming Data in Apache Kafka Date: Thursday, April 6, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Gwen Shapira, Product Manager, Confluent Using Apache Kafka to Analyze Session Windows Date: Thursday, March 30, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Michael Noll, Product Manager, Confluent Monitoring and Alerting Apache Kafka with Confluent Control Center Date: Thursday, March 16, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Nick Dearden, Director, Engineering and Product Data Pipelines Made Simple with Apache Kafka Date: Thursday, March 23, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Ewen Cheslack-Postava, Engineer, Confluent https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/ What’s New in Apache Kafka 0.10.2 and Confluent 3.2 Date: Thursday, March 9, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Clarke Patterson, Senior Director, Product Marketing
  • 3. 3 What is Data Governance? Data Access Master Data Management Data Agility Data Reliability Data Quality
  • 4. 4 To Grow a Successful Data Platform, You need Some Guarantees
  • 5. 5 If you can’t trust your data, what can you trust?
  • 6. 7 Schemas Are a Key Part of This 1. Schemas: Enforced Contracts Between your Components 2. Schemas need to be agile but compatible 3. You need tools & process that will let you: • Move fast and not break things
  • 7. 9 Cautionary tale: Uber Producers Service 1 Service 2 RedshiftS3 EMR {created_at: “Dec 1, 2015”} CREATE ( created_at: STRING ) parseDate(created_at, “%M %d, %yyyy”)
  • 8. 10 Producers Service 1 Service 2 RedshiftS3 EMR {created_at: 1491368429} CREATE ( created_at: STRING ) parseDate(created_at, “%M %d, %yyyy”) Cautionary tale: Uber
  • 9. 11 The right way to do things MessageCustomer RoomGifts BookRoom LoyalCustomers
  • 10. 12 The right way to do things MessageCustomer RoomGifts BookRoom LoyalCustomers BeachPromo If (property.location == beach && customer.level == platinum) sendMessage({topic: RoomGifts, customer_id: customer.id, room: property.roomnum, gift: seaShell})
  • 11. 13 There is special value in standardizing parts of schema { "type": "record", "name": "Event", "fields": [{ "name": "headers", "type": { “name” : “headers”, “type”: ”record” “fields”: [ {“name”: “source_app”, “type”: “string”}, {“name”: “host_app”, “type”: “string”}, {“name”: “destination_app”, “type”: [“null,“string”]}, {“name”: “timestamp”, “type”: “long”} ]}, { "name": "body", "type": "bytes" }] } Standard headers allowed analyzing patterns of communication S3
  • 12. 15 In Request Response World – APIs between services are key In Async Stream Processing World – Message Schemas ARE the API …except they stick around for a lot longer
  • 13. 16 Lets assume you see why you need schema compatibility… How do we make it work for us?
  • 14. 17 Schema Registry Elastic Cassandra Streams App Example Consumers Serializer App 1 Serializer App 2 ! Kafka Topic! Schema Registry Define the expected fields for each Kafka topic Automatically handle schema changes (e.g. new fields) Prevent incompatible changes Supports multi-datacenter environments
  • 15. 18 Key Points: • Schema Registry is efficient – due to caching • Schema Registry is transparent • Schema Registry is Highly Available • Schema Registry prevents bad data at the source
  • 16. 19 Big Benefits • Schema Management is part of your entire app lifecycle – from dev to prod • Single source of truth for all apps and data stores • No more “Bad Data” randomly breaking things • Increased agility! add, remove and modify fields with confidence • Teams can share and discover data • Smaller messages on the wire and in storage
  • 17. 20 But the Best Part is… RDBM Hadoop / Hive Kafka w/ Schema Registry JDBC Source HDFS Sink Schema Schema Schema Schema Schema
  • 19. 22 Compatibility tips! • Do you want to upgrade producer without touching consumers? • You want *forward* compatibility. • You can add fields. • You can delete fields with defaults • Do you want to update schema in a DWH but still read old messages? • You want *backward* compatibility • You can delete fields • You can add fields with defaults • Both? • You want *full* compatibility. • Default all things! • Except the primary key which you never touch.
  • 20. 23 App Lifecycle Developement Source Repository Pull Request CI Staging / Test Production “Nightly build” Lets Try Lets Do It! Optional Good Idea YES Almost too late Last resort
  • 21. 24 Finding Schemas 1. Write Schemas -> Generate Code 2. Get Schemas from registry -> Generate code 3. Write code -> Generate Schemas
  • 22. 25 Summary! 1. Data Quality is critical 2. Schema are critical 3. Schema Registry is used to maintain quality from Dev to Prod 4. Schema Registry is in Confluent Open Source
  • 23. 26 Attend the whole series! Simplify Governance for Streaming Data in Apache Kafka Date: Thursday, April 6, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Gwen Shapira, Product Manager, Confluent Using Apache Kafka to Analyze Session Windows Date: Thursday, March 30, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Michael Noll, Product Manager, Confluent Monitoring and Alerting Apache Kafka with Confluent Control Center Date: Thursday, March 16, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Nick Dearden, Director, Engineering and Product Data Pipelines Made Simple with Apache Kafka Date: Thursday, March 23, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Ewen Cheslack-Postava, Engineer, Confluent https://www.confluent.io/online-talk/online-talk-series-five-steps-to-production-with-apache-kafka/ What’s New in Apache Kafka 0.10.2 and Confluent 3.2 Date: Thursday, March 9, 2017 Time: 9:30 am - 10:00 am PT | 12:30 pm - 1:00 pm ET Speaker: Clarke Patterson, Senior Director, Product Marketing
  • 24. 27 Get Started with Apache Kafka Today! https://www.confluent.io/downloads/ THE place to start with Apache Kafka! Thoroughly tested and quality assured More extensible developer experience Easy upgrade path to Confluent Enterprise
  • 25. 28 30% off conference fee code = kafpcf17 New York City | May 8, 2017 Presented by

Notas do Editor

  1. Data Quality
  2. https://cdn.oreillystatic.com/en/assets/1/event/144/Scalable%20schema%20management%20for%20Hadoop%20and%20Spark%20applications%20Presentation.pdf
  3. This diagram shows what schema registry is for. From left to right: Schema registry is specific to a Kafka cluster, so you can have one schema registry per cluster. Applications can be third party apps like Twitter or SFDC or a custom application. Schema registry is relevant to every producer who can feed messages to your cluster. The schema registry exists outside your applications as a separate process. Within an application, there is a function called serializer that serializes messages for delivery to the kafka data pipeline. Confluent’s schema registry and integrated into the serializer. Some other apex developer companies have created their own schema registries outside the serializer; argument is that’s more complexity for the same functionality. The process goes like this: The serializer places a call to the schema registry, to see if it has a format for the data the application wants to publish. If it does, then schema registry passes that format to the application’s serializer, which uses it to filter out incorrectly formatted messages. This keeps your data pipeline clean. The format confluent uses for its schema registry is called Avro. Avro is the only known generally published schema format. i.e. everything else is custom. When you use the confluent schema registry, it’s automatically integrated into the serializer and there’s no effort you need to put into it. Your data pipeline will just be clean. You simply need to have all applications call the schema registry when publishing. The ultimate benefit of schema registry is that the consumer applications of your kafka topic can successfully read the messages they receive.
  4. Schema Registry is an industry best practice – Uber, Yelp, AirBnB, Ancestry, Monsanto