Introduction to Apache Kafka- Part 1

•Transferir como ODP, PDF•

11 gostaram•4,067 visualizações

Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. This is the first part of the presentation. Here is the 2nd part of this presentation:- http://www.slideshare.net/knoldus/introduction-to-apache-kafka-part-2

Software

Topics Covered
➢ What is Kafka
➢ Why Kafka
➢ High level overview
➢ Use cases
➢ Key terminology
➢ Partitions distribution over brokers
➢ Replication protocol
➢ Demo

What is Kafka
➢ publish-subscribe messaging system
➢ fast
➢ distributed by Design
➢ fault tolerant
➢ scalable
➢ durable
➢ written in Scala
➢ free and open source

Building Data Pipelines
This is Bad data pipelining

Building Data Pipelines
Kafka decouples
Data Pipelines

Use cases
➢ Messaging
➢ Website Activity Tracking
➢ Metrics
➢ Log Aggregation
➢ Real-Time Stream Processing
➢ Event Sourcing
➢ Commit Log
➢ Internet Of Things (IoT)

Anatomy of a Topic
For each topic, the Kafka cluster maintains a partitioned log that looks like this:
http://kafka.apache.org/images/log_anatomy.png
Number of partition for a Topic is configurable. In this example number of partition are three.

Reading & Writing From Topic
https://content.linkedin.com/content/dam/engineering/en-us/blog/migrated/partitioned_log_0.png
Topic with two partition:

Partitions Distribution
Who is responsible for these tasks ?

Responsibility Of Controller
● managing the states of partitions and replicas
● performing administrative tasks like reassigning partitions

Roles For Partition
➢ Each partition has one server which acts as the "leader" and zero or more servers which act as
"followers".
➢ The leader handles all read and write requests for the partition while the followers passively replicate
the leader.
➢ If the leader fails, one of the followers will automatically become the new leader.
➢ Each server acts as a leader for some of its partitions and a follower for others so load is well
balanced within the cluster.

Basic Operations
● List all topics created:
bin/kafka-topics.sh --list --zookeeper localhost:2181
● Describe a topic:
– bin/kafka-topics.sh --zookeeper localhost:2181 --topic topic-name –describe

Basic Operations
Adding a topic:
$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic topic_name
Modifying a topic
$ bin/kafka-topics.sh --zookeeper zk_host:localhost:2181 --alter --topic my_topic_name --partitions 4
Deleting a topic
bin/kafka-topics.sh --zookeeper zk_host:localhost:2181 --delete --topic my_topic_name

Basic Operations
Balancing Leadership:
$ bin/kafka-preferred-replica-election.sh --zookeeper zk_host:localhost:2181
– Or
Also configure Kafka to do this automatically by setting the following configuration :
auto.leader.rebalance.enable = true

References
● http://kafka.apache.org/documentation.html
● https://engineering.linkedin.com/kafka/benchmarking-apache-k
● http://www.confluent.io/blog/tutorial-getting-started-with-the-new
● http://kafka-summit.org
● http://www.confluent.io/blog/hands-free-kafka-replication-a-less

Thanks
Presenters:
@_himaniarora
@_satendrakumar
Organizer:
@knolspeak
http://www.knoldus.com

Mais conteúdo relacionado

Mais procurados

Kafka and Spark Streamingdatamantra

kafkaAriel Moskovich

Apache KafkaJoe Stein

Hello, kafka! (an introduction to apache kafka)Timothy Spann

Kafka Streams for Java enthusiastsSlim Baltagi

Introduction to Apache Kafka and why it matters - MadridPaolo Castagna

Kafka connect-london-meetup-2016Gwen (Chen) Shapira

Fundamentals of Apache KafkaChhavi Parasher

Kafka connect 101Whiteklay

Current and Future of Apache KafkaJoe Stein

Data Pipelines with Kafka ConnectKaufman Ng

Introduction to Kafka StreamsGuozhang Wang

From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent

Building High-Throughput, Low-Latency Pipelines in Kafkaconfluent

Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella

Kafka ConnectOleg Kuznetsov

Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin

Apache Kafka IntroductionAmita Mirajkar

Apache Kafka 0.8 basic training - VerisignMichael Noll

Apache kafkaNexThoughts Technologies

Mais procurados (20)

Kafka and Spark Streaming

kafka

Apache Kafka

Hello, kafka! (an introduction to apache kafka)

Kafka Streams for Java enthusiasts

Introduction to Apache Kafka and why it matters - Madrid

Kafka connect-london-meetup-2016

Fundamentals of Apache Kafka

Kafka connect 101

Current and Future of Apache Kafka

Data Pipelines with Kafka Connect

Introduction to Kafka Streams

From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning

Building High-Throughput, Low-Latency Pipelines in Kafka

Real time Messages at Scale with Apache Kafka and Couchbase

Kafka Connect

Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013

Apache Kafka Introduction

Apache Kafka 0.8 basic training - Verisign

Apache kafka

Semelhante a Introduction to Apache Kafka- Part 1

14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data

An Introduction to Apache KafkaAmir Sedighi

Introduction to apache kafkaSamuel Kerrien

Building Event-Driven Systems with Apache KafkaBrian Ritchie

Tips and Tricks for Operating Apache KafkaAll Things Open

Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...StreamNative

Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpJosé Román Martín Gil

Structured Streaming with Kafkadatamantra

Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&PierreStreamNative

Apache Kafka - Scalable Message-Processing and more !Guido Schmutz

DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann

Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesLINE Corporation

Big Data Streams Architectures. Why? What? How?Anton Nazaruk

Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaMark Bittmann

Introduction to Kafka Streams PresentationKnoldus Inc.

Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembStreamNative

Timothy Spann: Apache Pulsar for MLEdunomica

bigdata 2022_ FLiP Into Pulsar AppsTimothy Spann

Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community

Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent

Semelhante a Introduction to Apache Kafka- Part 1 (20)

14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...

An Introduction to Apache Kafka

Introduction to apache kafka

Building Event-Driven Systems with Apache Kafka

Tips and Tricks for Operating Apache Kafka

Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...

Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp

Structured Streaming with Kafka

Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre

Apache Kafka - Scalable Message-Processing and more !

DBCC 2021 - FLiP Stack for Cloud Data Lakes

Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages

Big Data Streams Architectures. Why? What? How?

Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Introduction to Kafka Streams Presentation

Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb

Timothy Spann: Apache Pulsar for ML

bigdata 2022_ FLiP Into Pulsar Apps

Ceph Community Talk on High-Performance Solid Sate Ceph

Capital One Delivers Risk Insights in Real Time with Stream Processing

Mais de Knoldus Inc.

Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingKnoldus Inc.

Akka gRPC Essentials A Hands-On IntroductionKnoldus Inc.

Entity Core with Core Microservices.pptxKnoldus Inc.

Introduction to Redis and its features.pptxKnoldus Inc.

GraphQL with .NET Core Microservices.pdfKnoldus Inc.

NuGet Packages Presentation (DoT NeT).pptxKnoldus Inc.

Data Quality in Test Automation Navigating the Path to Reliable TestingKnoldus Inc.

K8sGPTThe AI way to diagnose KubernetesKnoldus Inc.

Introduction to Circle Ci Presentation.pptxKnoldus Inc.

Robusta -Tool Presentation (DevOps).pptxKnoldus Inc.

Optimizing Kubernetes using GOLDILOCKS.pptxKnoldus Inc.

Azure Function App Exception Handling.pptxKnoldus Inc.

CQRS Design Pattern Presentation (Java).pptxKnoldus Inc.

ETL Observability: Azure to Snowflake PresentationKnoldus Inc.

Scripting with K6 - Beyond the Basics PresentationKnoldus Inc.

Getting started with dotnet core Web APIsKnoldus Inc.

Introduction To Rust part II PresentationKnoldus Inc.

Data governance with Unity Catalog PresentationKnoldus Inc.

Configuring Workflows & Validators in JIRAKnoldus Inc.

Advanced Python (with dependency injection and hydra configuration packages)Knoldus Inc.

Mais de Knoldus Inc. (20)

Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing

Akka gRPC Essentials A Hands-On Introduction

Entity Core with Core Microservices.pptx

Introduction to Redis and its features.pptx

GraphQL with .NET Core Microservices.pdf

NuGet Packages Presentation (DoT NeT).pptx

Data Quality in Test Automation Navigating the Path to Reliable Testing

K8sGPTThe AI way to diagnose Kubernetes

Introduction to Circle Ci Presentation.pptx

Robusta -Tool Presentation (DevOps).pptx

Optimizing Kubernetes using GOLDILOCKS.pptx

Azure Function App Exception Handling.pptx

CQRS Design Pattern Presentation (Java).pptx

ETL Observability: Azure to Snowflake Presentation

Scripting with K6 - Beyond the Basics Presentation

Getting started with dotnet core Web APIs

Introduction To Rust part II Presentation

Data governance with Unity Catalog Presentation

Configuring Workflows & Validators in JIRA

Advanced Python (with dependency injection and hydra configuration packages)

Último

Project Based Learning (A.I).pptx detail explanationkaushalgiri8080

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

5 Signs You Need a Fashion PLM Software.pdfWave PLM

DNT_Corporate presentation know about usDynamic Netsoft

Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

Exploring iOS App Development: Simplifying the ProcessEvangelist Apps https://twitter.com/EvangelistSW/

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

Active Directory Penetration Testing, cionsystems.com.pdfCionsystems

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

Right Money Management App For Your Financial GoalsJhone kinadey

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

TECUNIQUE: Success Stories: IT Service providermohitmore19

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

Introduction to Apache Kafka- Part 1

1. Himani Arora Software Consultant Knoldus Software LLP Satendra Kumar Sr. Software Consultant Knoldus Software LLP Introduction to Apache Kafka-01

2. Topics Covered ➢ What is Kafka ➢ Why Kafka ➢ High level overview ➢ Use cases ➢ Key terminology ➢ Partitions distribution over brokers ➢ Replication protocol ➢ Demo

3. What is Kafka ➢ publish-subscribe messaging system ➢ fast ➢ distributed by Design ➢ fault tolerant ➢ scalable ➢ durable ➢ written in Scala ➢ free and open source

4. Building Data Pipelines

5. Building Data Pipelines

6. Building Data Pipelines

7. Building Data Pipelines

8. Building Data Pipelines

9. Building Data Pipelines This is Bad data pipelining

10. Building Data Pipelines Kafka decouples Data Pipelines

11. High level overview

12. High level overview

13. Use cases ➢ Messaging ➢ Website Activity Tracking ➢ Metrics ➢ Log Aggregation ➢ Real-Time Stream Processing ➢ Event Sourcing ➢ Commit Log ➢ Internet Of Things (IoT)

14. Key Terminology

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26. Anatomy of a Topic For each topic, the Kafka cluster maintains a partitioned log that looks like this: http://kafka.apache.org/images/log_anatomy.png Number of partition for a Topic is configurable. In this example number of partition are three.

27. Reading & Writing From Topic https://content.linkedin.com/content/dam/engineering/en-us/blog/migrated/partitioned_log_0.png Topic with two partition:

28. Partitions distribution

29. Partitions distribution

30. Partitions distribution

31. Partitions distribution

32. Partitions distribution

33. Partitions distribution

34. Partitions distribution

35. Partitions Distribution Who is responsible for these tasks ?

36. Partitions Distribution

37. Partitions Distribution

38. Partitions Distribution

39. Responsibility Of Controller ● managing the states of partitions and replicas ● performing administrative tasks like reassigning partitions

40. Roles For Partition ➢ Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". ➢ The leader handles all read and write requests for the partition while the followers passively replicate the leader. ➢ If the leader fails, one of the followers will automatically become the new leader. ➢ Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.

41. Replication Protocol

42. Replication Protocol

43. Replication Protocol

44. Replication Protocol

45. Replication Protocol

46. Replication Protocol

47. Replication Protocol

48. Replication Protocol

49. Replication Protocol

50. Replication Protocol

51. Replication Protocol

52. Replication Protocol

53. Replication Protocol

54. Replication Protocol

55. Replication Protocol

56. Replication Protocol

57. Replication Protocol

58. Replication Protocol

59. Replication Protocol

60. Replication Protocol

61. Replication Protocol

62. Replication Protocol

63. Replication Protocol

64. Replication Protocol

65. Demo

66. Basic Operations ● List all topics created: bin/kafka-topics.sh --list --zookeeper localhost:2181 ● Describe a topic: – bin/kafka-topics.sh --zookeeper localhost:2181 --topic topic-name –describe

67. Basic Operations Adding a topic: $ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic topic_name Modifying a topic $ bin/kafka-topics.sh --zookeeper zk_host:localhost:2181 --alter --topic my_topic_name --partitions 4 Deleting a topic bin/kafka-topics.sh --zookeeper zk_host:localhost:2181 --delete --topic my_topic_name

68. Basic Operations Balancing Leadership: $ bin/kafka-preferred-replica-election.sh --zookeeper zk_host:localhost:2181 – Or Also configure Kafka to do this automatically by setting the following configuration : auto.leader.rebalance.enable = true

69. References ● http://kafka.apache.org/documentation.html ● https://engineering.linkedin.com/kafka/benchmarking-apache-k ● http://www.confluent.io/blog/tutorial-getting-started-with-the-new ● http://kafka-summit.org ● http://www.confluent.io/blog/hands-free-kafka-replication-a-less

70. Question & Option[Answer]

71. Thanks Presenters: @_himaniarora @_satendrakumar Organizer: @knolspeak http://www.knoldus.com

Notas do Editor

1) spend 10 to 20 % time for data integration 2) It is not scalable 3) push based system does not work.
Topics are high level abstraction that kafka provides. A topic is a category or feed name to which messages are published.
The topics are further divided into partitions.
Each partition is an ordered, immutable sequence of messages that is continually appended to—a commit log. The messages in the partitions are each assigned a sequential id number called the offset that uniquely identifies each message within the partition.
Producers publish data to the topics of their choice. The producer is responsible for choosing which message to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the message). More on the use of partitioning in a second.
1) The key abstraction in Kafka is the topic. 2) Producers publish their records to a topic, and consumers subscribe to one or more topics. 3) A Kafka topic is just a sharded write-ahead log. 4) Producers append records to these logs and consumers subscribe to changes. 5) Each record is a key/value pair. The key is used for assigning the record to a log partition (unless the publisher specifies the partition directly).
Each node in the cluster is called a Kafka broker.
Each partition is an ordered, immutable sequence of messages that is continually appended to—a commit log. The messages in the partitions are each assigned a sequential id number called the offset that uniquely identifies each message within the partition.

Introduction to Apache Kafka- Part 1

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Introduction to Apache Kafka- Part 1

Semelhante a Introduction to Apache Kafka- Part 1 (20)

Mais de Knoldus Inc.

Mais de Knoldus Inc. (20)

Último

Último (20)

Introduction to Apache Kafka- Part 1

Notas do Editor