SlideShare a Scribd company logo
1 of 44
Download to read offline
New age Distributed Messaging
Kafka & Concepts explored !!
Dileep Varma Kalidindi
Nov 2014
Who Am I ?
4/5/2016 Confidential 2
Name: Dileep Varma Kalidindi
Status: Senior Engineer @Responsys (since Apr’14), Circles Team.
Fascination: Problem Solving , Distributed & BigData churning systems.
Past: 8+yrs with VeriSign, Informatica Labs, NTT Data.
Hobbies: Jumping (Water & Air)
What is brewing today ?
4/5/2016 Confidential 3
 Responsys Technology Road Map.
 Data off the limits - Handling & Processing BigData
 Scope for New Age capabilities (in distributed msg’ng) – Architecture peek through
 Existing System bottlenecks & shortfalls
 Rethinking from fundamentals – Distributed Commit Log
 Kafka Messaging – Concept, Architecture, API & Demo
 Kafka Internals – ZooKeeper in depth, Atomic broadcast & Quorum
 Performance & feature comparisons – Traditional vs New Age
Are we good ?
4/5/2016 Confidential 4
Data off the limits – Handling larger Data sets
4/5/2016 Confidential 5
 Kafka on Responsys technology Road map - Antonio
 Data evolution from Traditional to BigData
 Characterized by Volume, Variety, Velocity, Variability, Veracity & Complexity
 Volume -> Quantity of data. Storage & Processing (Hadoop, NoSQL)
 Variety -> Diversity of data sets, OLTP, OLAP (NoSQL, NewSQL)
 Velocity -> Speed of data handling in real time (Kafka, Storm, Flume)
 Deeper market penetration implicitly transforms Data
 Our focus is on Velocity
 Need of the hour is Systems to handle – BigData Technologies
BigData Technologies – MindMap view
4/5/2016 Confidential 6
7
Uber
Application
Database
UI PUB WS CN BounceIS
LA JMS EC SPAM ETLAB
EventDB
CustDB
ReportDBSysAdmDB
Data
Warehouse
AuditDB
UsageDB
EMD CL PD
ICR
Content
IDDP
Short URL
SUL DIS
SMS PGPUSH
SMSL
Identifying Scope – Architecture Peek-in
REAL TIME PROCESSING
4/5/2016 Confidential 8
Is a there problem with my current System ?
 Existing systems are good (IBM MQ) in traditional sense.
 Delivery guarantee is good for Emails, what for events (PubWeb, Bounce, AB) ?
 Focus on throughput. Existing brokers have limitations.
 Scaling and Replication, cost of Cluster maintenance in existing MQ.
 Dynamic rebalancing of Brokers, Consumers
Rethink from Fundamentals
LOGS
4/5/2016 Confidential 9
Log’s – fundamental System blocks
4/5/2016 Confidential 10
• Log (as a foundation) :
 Append-only, totally-ordered sequence of records ordered by
time.
 Unique –sequential log entry (Clock Decoupled time stamp)
 Deterministic
• Logging (as a core process) :
• IS Machine readable logging
Ex: Write ahead logs, Commit logs &Transaction logs
• IS NOT Application logging (Human readable)
Ex: Log4j, slf4j etc..
• Backbone of Distributed Messaging, Databases, NoSQL, Key-
Value stores, replication, Hadoop, Version Control…
• Logs for Data Integration, Real time processing & System
building.
Log’s – solving Problems
4/5/2016 Confidential 11
• Logs are not new in Databases !!
 Started with IBM SystemR
 Physical logging – Values of rows changed, Logical logging – SQL Queries
 Logs implementations – ACID to Replication (Goldengate)
• State Machine Replication Principle
2 identical, deterministic process -> begin with same state, gets same inputs in order, produce same output and
ends in same state
• In Distributed Systems they Solve core problems
Ordering changes Distributing data
• Processing and replication
Active – Passive
Active - Active
Log’s – driving Architecture
4/5/2016 Confidential 12
• Log-structured data flow
 Cache system
 Asynchronous Production & Consumption
• Kafka Log Centric approach:
 Not a Database, Log file collection, Typical messaging system
• Event driven architecture:
 Kafka – event driven, Multi-subscriber system (Topic)
 Example – which performs multiple ops on one event job
4/5/2016 Confidential 13
Logs in ACTION
APACHE KAFKA
Kafka
4/5/2016
Confidential 14
 Introducing Kafka
“Should I wake-up now? ..why ?  “
 Kafka Core Concepts
Topics, partitions, replicas, producers, consumers, brokers
 Operating Kafka
Architecture, deploying, monitoring, P&S tuning
Introducing Kafka
4/5/2016
Confidential 15
http://kafka.apache.org/
Originated at LinkedIn, open sourced in early 2011
Implemented in Scala, some Java
9 core committers, plus ~ 20 contributors
Kafka is a distributed, partitioned, replicated commit log
service. A uniquely designed pub-sub messaging system
Designed for,
 High throughput to support high volume event feeds.
 Support real-time processing of these feeds to create new, derived feeds.
 low-latency delivery to handle traditional messaging use cases.
 Guarantee fault-tolerance
Kafka in Real business
4/5/2016
Confidential 16
Kafka is Amazingly fast – How ?
4/5/2016 17Confidential
• “Up to 2 million writes/sec on 3 cheap machines”
• Using 3 producers on 3 different machines, 3x async replication
• Only 1 producer/machine because NIC already saturated
• Sustained throughput as stored data grows
• Slightly different test config than 2M writes/sec above.
Kafka is Amazingly fast – Why ?
4/5/2016 18Confidential
• Fast writes:
• While Kafka persists all data to disk, essentially all writes go to the
page cache of OS, i.e. RAM.
• Cf. hardware specs and OS tuning (we cover this later)
• Fast reads:
• Very efficient to transfer data from page cache to a network socket
• Linux: sendfile() system call
• Combination of the two = fast Kafka!
• Example (Operations): On a Kafka cluster where the consumers are mostly caught
up you will see no read activity on the disks as they will be serving data entirely
from cache.
Kafka Core Concepts - A first look
4/5/2016 19Confidential
• The who is who
• Producers write data to brokers.
• Consumers read data from brokers.
• All this is distributed.
• The data
• Data is stored in topics.
• Topics are split into partitions, which are replicated
Kafka Concepts - Topics
4/5/2016 20Confidential
• Topic: feed name to which messages are published
• Example: “pubweb.event.2”
Kafka Concepts - Topics
4/5/2016 21Confidential
Kafka Concepts -Creating a Topic
4/5/2016 22Confidential
• Creating a topic
• CLI
• API
https://github.com/miguno/kafka-storm-
starter/blob/develop/src/main/scala/com/miguno/kafkastorm/storm/KafkaStorm
Demo.scala
• Auto-create via auto.create.topics.enable = true
• Modifying a topic
- Add partitions
- Add configs
- Remove Configs
- Deleting topics
$ kafka-topics.sh --zookeeper zookeeper1:2181 --create --topic zerg.hydra 
--partitions 3 --replication-factor 2 
--config x=y
Kafka Concepts - Partitions
4/5/2016 23Confidential
• A topic consists of partitions
• Partition: ordered + immutable sequence of messages
that is continually appended to
• Partitions of a topic are Configurable
Kafka Concepts - Partition Offset
4/5/2016 24Confidential
• Offset: messages in the partitions are each assigned a unique (per
partition) and sequential id called the offset
• Consumers track their pointers via (offset, partition, topic) tuples
Consumer group C1
Kafka Concepts - Partition Replica’s
4/5/2016 25Confidential
• Replicas: “backups” of a partition
• They exist solely to prevent data loss.
• Replicas are never read from, never written to.
• They do NOT help to increase producer or consumer parallelism!
Topics vs Partitions vs Replica’s
4/5/2016 26Confidential
Kafka Concepts - Topic inspection
4/5/2016 27Confidential
• --describe the topic
• Leader: brokerID of the currently elected leader broker
• Replica ID’s = broker ID’s
• ISR = “in-sync replica”, replicas that are in sync with the leader
• In this example:
• Broker 0 is leader for partition 1.
• Broker 1 is leader for partitions 0 and 2.
• All replicas are in-sync with their respective leader partitions.
$ kafka-topics.sh --zookeeper zookeeper1:2181 --describe --topic zerg.hydra
Topic:zerg2.hydra PartitionCount:3 ReplicationFactor:2 Configs:
Topic: zerg2.hydra Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0
Topic: zerg2.hydra Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0,1
Topic: zerg2.hydra Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0
Kafka Concepts - Consumers & Producers
4/5/2016 28Confidential
df
Kafka Concepts - Producer
4/5/2016 29Confidential
df
• Code
• Start Producer
Kafka Concepts - Consumers
4/5/2016 30Confidential
df
• Code
• Start Consumer
• Multithreaded Consumer for multiple
partitions
Kafka Core Concepts - Recap
4/5/2016 31Confidential
• The who is who
• Producers write data to brokers.
• Consumers read data from brokers.
• All this is distributed.
• The data
• Data is stored in topics.
• Topics are split into partitions, which are replicated
4/5/2016 Confidential 32
Monitoring & Testing
Kafka – Monitoring and Testing
4/5/2016 33Confidential
• JMX Enabled
• System tools
• Describe
• Quantified Offset Monitor
• Monitoring DEMO
4/5/2016 Confidential 34
Empowering Kafka
Apache ZooKeeper
4/5/2016
Confidential 35
 Apache Kafka uses ZooKeeper to detect crashes, implement topic
discovery, and maintain production & consumption state for topics.
 High-performance coordination service for distributed applications.
 SoC – Separates Coordination overhead from Application logic.
 Centralized service for naming (registry), configuration
management, synchronization, and group membership services.
 Zookeeper is backbone for Hbase, Solr, Facebook messaging apps &
many more distributed apps.
 Simple, Replicated, Ordered and Fast
Zookeeper- Internals
4/5/2016
Confidential 36
 Znodes
 Persistent – exists till deleted
 Ephemeral - session scope
 Reads by all Nodes and Writes through Leaders
 Data is stored as byte array
 Allows Watches and notifications
 Ensemble – a group of Servers available to service
 Quorum determined leader selection
ZooKeeper – Guarantees
4/5/2016
Confidential 37
• Follows principles of ATOMIC broadcast
 Sequential Consistency – Updates are applied in order
 Atomicity – Updates either succeed or fail
 Single system image – Same view of service regardless of ZK server
 Reliability – Persistence of updates
 Timeliness – System is guaranteed to be up-to-date within time bound
• In Summary - Zookeeper { Leader Activation + Message delivery }
4/5/2016 Confidential 38
Kafka Performance
Kafka performance – Producer tests
(LinkedIn benchmark test)
4/5/2016
Confidential 39
• HW Set-up with 2 linux nodes
• Each with 8 2 GHZ cores (8 Cores/Mac ~ 16 GHZ processing)
• 16 GB of RAM, 6 disks with RAID 10 and 1GB network connection.
• Producer test
• Single producer ~ 10 million msgs each of 200bytes
• Kafka msg batch 1 and 50. Other MQ’s no batching
• X-axis – Msg sent to broker, Y-axis – Producer throughput
• Why is Producer fast
• No ACK
• Batching
• Kafka storage format
Kafka performance – Consumer tests
(LinkedIn benchmark test)
4/5/2016
Confidential 40
• HW Set-up with 2 linux nodes
• Each with 8 2 GHZ cores (8 Cores/Mac ~ 16 GHZ processing)
• 16 GB of RAM, 6 disks with RAID 10 and 1GB network connection.
• Consumer test
• Single consumer retrives 10 million msgs each of 200bytes
• Each pull request for 1000 msgs (200kb)
• X-axis – Msg consumed from broker, Y-axis – consumer throughput
• Why is Producer fast
• No Delivery state storage
• Kafka storage format
(less data transmitted)
4/5/2016 Confidential 41
Summary, Conclusions
&
References
Summary – quick Recap
4/5/2016
Confidential 42
 Importance Handling & Processing BigData
 Scope for introduction in Responsys Architecture
 Existing System bottlenecks & shortfalls
 Distributed Commit Log
 Kafka Messaging
 Kafka Internals – ZooKeeper
 Performance & feature comparisons – Traditional vs New Age
Conclusion – Open ended
4/5/2016
Confidential 43
• Limitation is on Data – not on Systems
• No need for complete revamp
• Choice of Right systems at right time is the recipe.
References
1. https://kafka.apache.org/
2. http://zookeeper.apache.org/
3. http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-
three-cheap-machines
4. http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-
should-know-about-real-time-datas-unifying
4/5/2016 Confidential 44
THANK YOU

More Related Content

What's hot

Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNDataWorks Summit/Hadoop Summit
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache KafkaJoe Stein
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streamsJoey Echeverria
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
Strata Hadoop Hopsworks
Strata Hadoop HopsworksStrata Hadoop Hopsworks
Strata Hadoop HopsworksJim Dowling
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 
What will be new in Apache NiFi 1.2.0
What will be new in Apache NiFi 1.2.0What will be new in Apache NiFi 1.2.0
What will be new in Apache NiFi 1.2.0Koji Kawamura
 
Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4DataWorks Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 

What's hot (20)

Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARN
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Cooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython NotebookCooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython Notebook
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streams
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Strata Hadoop Hopsworks
Strata Hadoop HopsworksStrata Hadoop Hopsworks
Strata Hadoop Hopsworks
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
What will be new in Apache NiFi 1.2.0
What will be new in Apache NiFi 1.2.0What will be new in Apache NiFi 1.2.0
What will be new in Apache NiFi 1.2.0
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 

Similar to Distributed messaging through Kafka

Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaMark Bittmann
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Data Con LA
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Denodo
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuideInexture Solutions
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafkaconfluent
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBrian Ritchie
 

Similar to Distributed messaging through Kafka (20)

Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
04-Kafka.pptx
04-Kafka.pptx04-Kafka.pptx
04-Kafka.pptx
 
04-Kafka.pptx
04-Kafka.pptx04-Kafka.pptx
04-Kafka.pptx
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 

Recently uploaded

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 

Recently uploaded (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 

Distributed messaging through Kafka

  • 1. New age Distributed Messaging Kafka & Concepts explored !! Dileep Varma Kalidindi Nov 2014
  • 2. Who Am I ? 4/5/2016 Confidential 2 Name: Dileep Varma Kalidindi Status: Senior Engineer @Responsys (since Apr’14), Circles Team. Fascination: Problem Solving , Distributed & BigData churning systems. Past: 8+yrs with VeriSign, Informatica Labs, NTT Data. Hobbies: Jumping (Water & Air)
  • 3. What is brewing today ? 4/5/2016 Confidential 3  Responsys Technology Road Map.  Data off the limits - Handling & Processing BigData  Scope for New Age capabilities (in distributed msg’ng) – Architecture peek through  Existing System bottlenecks & shortfalls  Rethinking from fundamentals – Distributed Commit Log  Kafka Messaging – Concept, Architecture, API & Demo  Kafka Internals – ZooKeeper in depth, Atomic broadcast & Quorum  Performance & feature comparisons – Traditional vs New Age
  • 4. Are we good ? 4/5/2016 Confidential 4
  • 5. Data off the limits – Handling larger Data sets 4/5/2016 Confidential 5  Kafka on Responsys technology Road map - Antonio  Data evolution from Traditional to BigData  Characterized by Volume, Variety, Velocity, Variability, Veracity & Complexity  Volume -> Quantity of data. Storage & Processing (Hadoop, NoSQL)  Variety -> Diversity of data sets, OLTP, OLAP (NoSQL, NewSQL)  Velocity -> Speed of data handling in real time (Kafka, Storm, Flume)  Deeper market penetration implicitly transforms Data  Our focus is on Velocity  Need of the hour is Systems to handle – BigData Technologies
  • 6. BigData Technologies – MindMap view 4/5/2016 Confidential 6
  • 7. 7 Uber Application Database UI PUB WS CN BounceIS LA JMS EC SPAM ETLAB EventDB CustDB ReportDBSysAdmDB Data Warehouse AuditDB UsageDB EMD CL PD ICR Content IDDP Short URL SUL DIS SMS PGPUSH SMSL Identifying Scope – Architecture Peek-in REAL TIME PROCESSING
  • 8. 4/5/2016 Confidential 8 Is a there problem with my current System ?  Existing systems are good (IBM MQ) in traditional sense.  Delivery guarantee is good for Emails, what for events (PubWeb, Bounce, AB) ?  Focus on throughput. Existing brokers have limitations.  Scaling and Replication, cost of Cluster maintenance in existing MQ.  Dynamic rebalancing of Brokers, Consumers
  • 10. Log’s – fundamental System blocks 4/5/2016 Confidential 10 • Log (as a foundation) :  Append-only, totally-ordered sequence of records ordered by time.  Unique –sequential log entry (Clock Decoupled time stamp)  Deterministic • Logging (as a core process) : • IS Machine readable logging Ex: Write ahead logs, Commit logs &Transaction logs • IS NOT Application logging (Human readable) Ex: Log4j, slf4j etc.. • Backbone of Distributed Messaging, Databases, NoSQL, Key- Value stores, replication, Hadoop, Version Control… • Logs for Data Integration, Real time processing & System building.
  • 11. Log’s – solving Problems 4/5/2016 Confidential 11 • Logs are not new in Databases !!  Started with IBM SystemR  Physical logging – Values of rows changed, Logical logging – SQL Queries  Logs implementations – ACID to Replication (Goldengate) • State Machine Replication Principle 2 identical, deterministic process -> begin with same state, gets same inputs in order, produce same output and ends in same state • In Distributed Systems they Solve core problems Ordering changes Distributing data • Processing and replication Active – Passive Active - Active
  • 12. Log’s – driving Architecture 4/5/2016 Confidential 12 • Log-structured data flow  Cache system  Asynchronous Production & Consumption • Kafka Log Centric approach:  Not a Database, Log file collection, Typical messaging system • Event driven architecture:  Kafka – event driven, Multi-subscriber system (Topic)  Example – which performs multiple ops on one event job
  • 13. 4/5/2016 Confidential 13 Logs in ACTION APACHE KAFKA
  • 14. Kafka 4/5/2016 Confidential 14  Introducing Kafka “Should I wake-up now? ..why ?  “  Kafka Core Concepts Topics, partitions, replicas, producers, consumers, brokers  Operating Kafka Architecture, deploying, monitoring, P&S tuning
  • 15. Introducing Kafka 4/5/2016 Confidential 15 http://kafka.apache.org/ Originated at LinkedIn, open sourced in early 2011 Implemented in Scala, some Java 9 core committers, plus ~ 20 contributors Kafka is a distributed, partitioned, replicated commit log service. A uniquely designed pub-sub messaging system Designed for,  High throughput to support high volume event feeds.  Support real-time processing of these feeds to create new, derived feeds.  low-latency delivery to handle traditional messaging use cases.  Guarantee fault-tolerance
  • 16. Kafka in Real business 4/5/2016 Confidential 16
  • 17. Kafka is Amazingly fast – How ? 4/5/2016 17Confidential • “Up to 2 million writes/sec on 3 cheap machines” • Using 3 producers on 3 different machines, 3x async replication • Only 1 producer/machine because NIC already saturated • Sustained throughput as stored data grows • Slightly different test config than 2M writes/sec above.
  • 18. Kafka is Amazingly fast – Why ? 4/5/2016 18Confidential • Fast writes: • While Kafka persists all data to disk, essentially all writes go to the page cache of OS, i.e. RAM. • Cf. hardware specs and OS tuning (we cover this later) • Fast reads: • Very efficient to transfer data from page cache to a network socket • Linux: sendfile() system call • Combination of the two = fast Kafka! • Example (Operations): On a Kafka cluster where the consumers are mostly caught up you will see no read activity on the disks as they will be serving data entirely from cache.
  • 19. Kafka Core Concepts - A first look 4/5/2016 19Confidential • The who is who • Producers write data to brokers. • Consumers read data from brokers. • All this is distributed. • The data • Data is stored in topics. • Topics are split into partitions, which are replicated
  • 20. Kafka Concepts - Topics 4/5/2016 20Confidential • Topic: feed name to which messages are published • Example: “pubweb.event.2”
  • 21. Kafka Concepts - Topics 4/5/2016 21Confidential
  • 22. Kafka Concepts -Creating a Topic 4/5/2016 22Confidential • Creating a topic • CLI • API https://github.com/miguno/kafka-storm- starter/blob/develop/src/main/scala/com/miguno/kafkastorm/storm/KafkaStorm Demo.scala • Auto-create via auto.create.topics.enable = true • Modifying a topic - Add partitions - Add configs - Remove Configs - Deleting topics $ kafka-topics.sh --zookeeper zookeeper1:2181 --create --topic zerg.hydra --partitions 3 --replication-factor 2 --config x=y
  • 23. Kafka Concepts - Partitions 4/5/2016 23Confidential • A topic consists of partitions • Partition: ordered + immutable sequence of messages that is continually appended to • Partitions of a topic are Configurable
  • 24. Kafka Concepts - Partition Offset 4/5/2016 24Confidential • Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset • Consumers track their pointers via (offset, partition, topic) tuples Consumer group C1
  • 25. Kafka Concepts - Partition Replica’s 4/5/2016 25Confidential • Replicas: “backups” of a partition • They exist solely to prevent data loss. • Replicas are never read from, never written to. • They do NOT help to increase producer or consumer parallelism!
  • 26. Topics vs Partitions vs Replica’s 4/5/2016 26Confidential
  • 27. Kafka Concepts - Topic inspection 4/5/2016 27Confidential • --describe the topic • Leader: brokerID of the currently elected leader broker • Replica ID’s = broker ID’s • ISR = “in-sync replica”, replicas that are in sync with the leader • In this example: • Broker 0 is leader for partition 1. • Broker 1 is leader for partitions 0 and 2. • All replicas are in-sync with their respective leader partitions. $ kafka-topics.sh --zookeeper zookeeper1:2181 --describe --topic zerg.hydra Topic:zerg2.hydra PartitionCount:3 ReplicationFactor:2 Configs: Topic: zerg2.hydra Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0 Topic: zerg2.hydra Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0,1 Topic: zerg2.hydra Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0
  • 28. Kafka Concepts - Consumers & Producers 4/5/2016 28Confidential df
  • 29. Kafka Concepts - Producer 4/5/2016 29Confidential df • Code • Start Producer
  • 30. Kafka Concepts - Consumers 4/5/2016 30Confidential df • Code • Start Consumer • Multithreaded Consumer for multiple partitions
  • 31. Kafka Core Concepts - Recap 4/5/2016 31Confidential • The who is who • Producers write data to brokers. • Consumers read data from brokers. • All this is distributed. • The data • Data is stored in topics. • Topics are split into partitions, which are replicated
  • 33. Kafka – Monitoring and Testing 4/5/2016 33Confidential • JMX Enabled • System tools • Describe • Quantified Offset Monitor • Monitoring DEMO
  • 35. Apache ZooKeeper 4/5/2016 Confidential 35  Apache Kafka uses ZooKeeper to detect crashes, implement topic discovery, and maintain production & consumption state for topics.  High-performance coordination service for distributed applications.  SoC – Separates Coordination overhead from Application logic.  Centralized service for naming (registry), configuration management, synchronization, and group membership services.  Zookeeper is backbone for Hbase, Solr, Facebook messaging apps & many more distributed apps.  Simple, Replicated, Ordered and Fast
  • 36. Zookeeper- Internals 4/5/2016 Confidential 36  Znodes  Persistent – exists till deleted  Ephemeral - session scope  Reads by all Nodes and Writes through Leaders  Data is stored as byte array  Allows Watches and notifications  Ensemble – a group of Servers available to service  Quorum determined leader selection
  • 37. ZooKeeper – Guarantees 4/5/2016 Confidential 37 • Follows principles of ATOMIC broadcast  Sequential Consistency – Updates are applied in order  Atomicity – Updates either succeed or fail  Single system image – Same view of service regardless of ZK server  Reliability – Persistence of updates  Timeliness – System is guaranteed to be up-to-date within time bound • In Summary - Zookeeper { Leader Activation + Message delivery }
  • 39. Kafka performance – Producer tests (LinkedIn benchmark test) 4/5/2016 Confidential 39 • HW Set-up with 2 linux nodes • Each with 8 2 GHZ cores (8 Cores/Mac ~ 16 GHZ processing) • 16 GB of RAM, 6 disks with RAID 10 and 1GB network connection. • Producer test • Single producer ~ 10 million msgs each of 200bytes • Kafka msg batch 1 and 50. Other MQ’s no batching • X-axis – Msg sent to broker, Y-axis – Producer throughput • Why is Producer fast • No ACK • Batching • Kafka storage format
  • 40. Kafka performance – Consumer tests (LinkedIn benchmark test) 4/5/2016 Confidential 40 • HW Set-up with 2 linux nodes • Each with 8 2 GHZ cores (8 Cores/Mac ~ 16 GHZ processing) • 16 GB of RAM, 6 disks with RAID 10 and 1GB network connection. • Consumer test • Single consumer retrives 10 million msgs each of 200bytes • Each pull request for 1000 msgs (200kb) • X-axis – Msg consumed from broker, Y-axis – consumer throughput • Why is Producer fast • No Delivery state storage • Kafka storage format (less data transmitted)
  • 41. 4/5/2016 Confidential 41 Summary, Conclusions & References
  • 42. Summary – quick Recap 4/5/2016 Confidential 42  Importance Handling & Processing BigData  Scope for introduction in Responsys Architecture  Existing System bottlenecks & shortfalls  Distributed Commit Log  Kafka Messaging  Kafka Internals – ZooKeeper  Performance & feature comparisons – Traditional vs New Age
  • 43. Conclusion – Open ended 4/5/2016 Confidential 43 • Limitation is on Data – not on Systems • No need for complete revamp • Choice of Right systems at right time is the recipe. References 1. https://kafka.apache.org/ 2. http://zookeeper.apache.org/ 3. http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second- three-cheap-machines 4. http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer- should-know-about-real-time-datas-unifying