2. www.edureka.co/apache-kafka
What will you learn today ?
What is Apache Kafka ?
Architecture of Kafka
Multiple ways of setting Kafka cluster
Comparing Kafka with other messaging systems
Hands-On : Getting started with Kafka
3. www.edureka.co/apache-kafka
Data : The Ingredient
Data is the main ingredient of Internet applications and typically includes the following :
Page visits and clicks
User activities
Events corresponding to logins
Social networking activities such as likes, shares, and comments
Application specific metrics (e.g. logs, page load time, performance etc.)
4. www.edureka.co/apache-kafka
Need : Real Time Analytics
In todays applications, activity data has become a part of production data and is used to run
analytics in real time. These analytics can be:
Delivering advertisements to the masses
Any abnormal user behavior or application hacking
Search-based on relevance
Recommendations based on popularity
5. www.edureka.co/apache-kafka
Messaging Systems
Messaging systems provide seamless integration among distributed applications with the
help of messages, that are shared between them
In the present big-data era, the very first challenge is to collect the data as it is a huge and the
second challenge is to analyze it, one way to solve this problem is by using messaging systems
Problem :
Solution :
6. www.edureka.co/apache-kafka
Apache Kafka
Apache Kafka is a distributed publish-subscribe messaging system
Originally developed at LinkedIn and later on became a part of Apache project
Kafka is fast, scalable, durable and distributed by design
7. www.edureka.co/apache-kafka
Kafka Architecture
Producer
ConsumerConsumerConsumer
Producer Producer
Kafka Cluster
A stream of messages of particular category is called a topic. Producers publish messages to a topic
A Producer can be any application who can publish messages to a topic
Consumers subscribe to topics and consume the messages
Kafka cluster is a set of servers, each of which is called a broker
Kafka Architecture
8. www.edureka.co/apache-kafka
ZooKeeper and Kafka
Each Kafka broker coordinates with other Kafka brokers using ZooKeeper
Producers and Consumers are notified by ZooKeeper service about the presence
of new broker in Kafka system or failure of the broker in Kafka system
9. www.edureka.co/apache-kafka
Kafka Clusters
With Kafka we can create multiple types of clusters, such as the following :
Single node single broker cluster
Single node multiple broker cluster
Multiple nodes multiple broker cluster
14. www.edureka.co/apache-kafka
Kafka @ LinkedIn
LinkedIn notifications are powered by Kafka
Apart from this LinkedIn uses Kafka for many
other purposes like log monitoring, performance
metrics, search improvement etc.
15. www.edureka.co/apache-kafka
Who else uses Kafka ?
DataSift uses Kafka as a collector of monitoring events and to track user’s
consumption of data streams in real time
Wooga uses Kafka to aggregate and process tracking data from all their
facebook games (hosted at various providers) in a central location
Spongecell uses Kafka to run their entire analytics and monitoring pipeline
driving both real-time and ETL applications
Loggly is the world's most popular cloud-based log management. It uses
Kafka for log collection
An exhaustive list of companies using Kafka can be found here : https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
21. www.edureka.co/apache-kafka
Survey
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your
experience better!
Please spare few minutes to take the survey after the webinar.
22. www.edureka.co/apache-kafka
Course Details
Edureka's Apache Kafka course:
• Introduction of course
• Online Live Classes: 15 hours
• Assignments: 25 hours
• Project: 20 hours
• Lifetime Access + 24 X 7 Support
Go to www.edureka.co/apache-kafka
Batch starts from 07 November (Weekend Batch)