SlideShare uma empresa Scribd logo
1 de 27
REAL-TIME STREAMING AND DATA PIPELINES WITH
APACHE KAFKA
Real-time Streaming
Real-time Streaming with Data
Pipelines
Sync
Z-Y < 20ms
Y-X < 20ms
Async
Z-Y < 1ms
Y-X < 1ms
Real-time Streaming with Data
Pipelines
Sync
Y-X < 20ms
Async
Y-X < 1ms
Real-time Streaming with Data
Pipelines
Before we get started
 Samples
 https://github.com/stealthly/scala-kafka

 Apache Kafka 0.8.0 Source Code
 https://github.com/apache/kafka

 Docs
 http://kafka.apache.org/documentation.html

 Wiki
 https://cwiki.apache.org/confluence/display/KAFKA/Index

 Presentations
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations

Tools
 https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools
 https://github.com/apache/kafka/tree/0.8/core/src/main/scala/kafka/tools
Apache Kafka
 Fast - A single Kafka broker can handle hundreds of megabytes of reads and writes per second
from thousands of clients.
 Scalable - Kafka is designed to allow a single cluster to serve as the central data backbone for a
large organization. It can be elastically and transparently expanded without downtime. Data
streams are partitioned and spread over a cluster of machines to allow data streams larger than
the capability of any single machine and to allow clusters of co-ordinated consumers
 Durable - Messages are persisted on disk and replicated within the cluster to prevent data loss.
Each broker can handle terabytes of messages without performance impact.

 Distributed by Design - Kafka has a modern cluster-centric design that offers strong durability
and fault-tolerance guarantees.
Up and running with Apache Kafka
1) Install Vagrant http://www.vagrantup.com/
2) Install Virtual Box https://www.virtualbox.org/
3) git clone https://github.com/stealthly/scala-kafka
4) cd scala-kafka
5) vagrant up
Zookeeper will be running on 192.168.86.5
BrokerOne will be running on 192.168.86.10
All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm)
6) ./sbt test
[success] Total time: 37 s, completed Dec 19, 2013 11:21:13 AM
Maven
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.0</version>
<exclusions>
<!-- https://issues.apache.org/jira/browse/KAFKA-1160 -->
<exclusion>
<groupId>com.sun.jdmk</groupId>
<artifactId>jmxtools</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jmx</groupId>
<artifactId>jmxri</artifactId>
</exclusion>
</exclusions>
</dependency>
SBT
"org.apache.kafka" % "kafka_2.10" % "0.8.0“ intransitive()
Or
libraryDependencies ++= Seq(
"org.apache.kafka" % "kafka_2.10" % "0.8.0",
)
https://issues.apache.org/jira/browse/KAFKA-1160
https://github.com/apache/kafka/blob/0.8/project/Build.scala?source=c
Apache Kafka
Producers

Consumers

Brokers

 Topics
 Brokers
 Sync Producers
 Async Producers
(Async/Sync)=> Akka Producers

 Topics
 Partitions
 Read from the start
 Read from where last left off

 Partitions
 Load Balancing Producers
 Load Balancing Consumer
 In Sync Replicaset (ISR)
 Client API
Producers
 Topics
 Brokers
 Sync Producers
 Async Producers
 (Async/Sync)=> Akka Producers
Producer

/** at least one of these for every partition **/
val producer = new KafkaProducer(“topic","192.168.86.10:9092")
producer.sendMessage(testMessage)
Kafka
Producer

case class KafkaProducer(
topic: String,
brokerList: String,
synchronously: Boolean = true,
compress: Boolean = true,
batchSize: Integer = 200,
messageSendMaxRetries: Integer = 3
){
val props = new Properties()
val codec = if(compress) DefaultCompressionCodec.codec else NoCompressionCodec.codec
props.put("compression.codec", codec.toString)
props.put("producer.type", if(synchronously) "sync" else "async")
props.put("metadata.broker.list", brokerList)
props.put("batch.num.messages", batchSize.toString)
props.put("message.send.max.retries", messageSendMaxRetries.toString)

wrapper for
core/src/main/kafka/producer/Producer.Scala

val producer = new Producer[AnyRef, AnyRef](new ProducerConfig(props))
def sendMessage(message: String) = {
try {
producer.send(new KeyedMessage(topic,message.getBytes))
} catch {
case e: Exception =>
e.printStackTrace
System.exit(1)
}
}
}
Docs
http://kafka.apache.org/documentation.html#producerconfigs

Producer
Config

Source
https://github.com/apache/kafka/blob/0.8/core/src/main/scala/kafka/
producer/ProducerConfig.scala?source=c
class KafkaAkkaProducer extends Actor with Logging {
private val producerCreated = new AtomicBoolean(false)
var producer: KafkaProducer = null
def init(topic: String, zklist: String) = {
if (!producerCreated.getAndSet(true)) {
producer = new KafkaProducer(topic,zklist)
}

Akka Producer

def receive = {
case t: (String, String) ⇒ {
init(t._1, t._2)
}
case msg: String ⇒ {
producer.sendString(msg)
}
}
}
Consumers
Topics
Partitions
Read from the start
Read from where last left off
class KafkaConsumer(
topic: String,
groupId: String,
zookeeperConnect: String,
readFromStartOfStream: Boolean = true
) extends Logging {
val props = new Properties()
props.put("group.id", groupId)
props.put("zookeeper.connect", zookeeperConnect)
props.put("auto.offset.reset", if(readFromStartOfStream) "smallest" else "largest")
val config = new ConsumerConfig(props)
val connector = Consumer.create(config)
val filterSpec = new Whitelist(topic)
val stream = connector.createMessageStreamsByFilter(filterSpec, 1, new DefaultDecoder(), new DefaultDecoder()).get(0)
def read(write: (Array[Byte])=>Unit) = {
for(messageAndTopic <- stream) {
write(messageAndTopic.message)
}
}
def close() {
connector.shutdown()
}
}
Source Sample
https://github.com/apache/kafka/blob/0.8/core/src/main/sca
la/kafka/tools/SimpleConsumerShell.scala?source=c

Low Level
Consumer
val topic = “publisher”
val consumer = new KafkaConsumer(topic ,”loop”,"192.168.86.5:2181")
val count = 2
val pdc = system.actorOf(Props[KafkaAkkaProducer].withRouter(RoundRobinRouter(count)), "router")
1 to countforeach { i =>(
pdc ! (topic,"192.168.86.10:9092"))

}
def exec(binaryObject: Array[Byte]) = {
val message = new String(binaryObject)
pdc ! message
}
pdc ! “go, go gadget stream 1”
pdc ! “go, go gadget stream 2”
consumer.read(exec)
Brokers
 Partitions
 Load Balancing Producers
 Load Balancing Consumer
 In Sync Replicaset (ISR)
 Client API
Partitions make up a Topic
Load Balancing Producer By
Partition
Load Balancing Consumer By
Partition
Replication – In Sync Replicas
Client API
o Developers Guide
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
o Non-JVM Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients
o Python
o Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported.

oC
o High performance C library with full protocol support

o C++
o Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset.

o Go (aka golang)
o Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported.

o Ruby
o Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI 2.

o Clojure
o https://github.com/pingles/clj-kafka/ 0.8 branch
Questions?
/***********************************
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly

Twitter: @allthingshadoop
************************************/

Mais conteúdo relacionado

Mais procurados

Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Lightbend
 

Mais procurados (20)

Akka streams scala italy2015
Akka streams scala italy2015Akka streams scala italy2015
Akka streams scala italy2015
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
Modernizing Infrastructures for Fast Data with Spark, Kafka, Cassandra, React...
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
 
Akka streams
Akka streamsAkka streams
Akka streams
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Introducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache KafkaIntroducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache Kafka
 
Reactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServicesReactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServices
 
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
 
Asynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbsAsynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbs
 
Akka Streams - From Zero to Kafka
Akka Streams - From Zero to KafkaAkka Streams - From Zero to Kafka
Akka Streams - From Zero to Kafka
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxIntroducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
 

Destaque

Destaque (6)

Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier Architectures
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 

Semelhante a Real-time streaming and data pipelines with Apache Kafka

Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
Doug Chang
 
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
HostedbyConfluent
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
confluent
 

Semelhante a Real-time streaming and data pipelines with Apache Kafka (20)

Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
 
Spark Streaming Info
Spark Streaming InfoSpark Streaming Info
Spark Streaming Info
 
Sparkstreaming
SparkstreamingSparkstreaming
Sparkstreaming
 
Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021
Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021
Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021
 
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love5 Takeaways from Migrating a Library to Scala 3 - Scala Love
5 Takeaways from Migrating a Library to Scala 3 - Scala Love
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Kafka streams - From pub/sub to a complete stream processing platform
Kafka streams - From pub/sub to a complete stream processing platformKafka streams - From pub/sub to a complete stream processing platform
Kafka streams - From pub/sub to a complete stream processing platform
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
 
Training
TrainingTraining
Training
 
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
 
Apache Camel - The integration library
Apache Camel - The integration libraryApache Camel - The integration library
Apache Camel - The integration library
 
Scala active record
Scala active recordScala active record
Scala active record
 
Spring into rails
Spring into railsSpring into rails
Spring into rails
 
Introduction to Apache Camel
Introduction to Apache CamelIntroduction to Apache Camel
Introduction to Apache Camel
 
Continuous Delivery: The Next Frontier
Continuous Delivery: The Next FrontierContinuous Delivery: The Next Frontier
Continuous Delivery: The Next Frontier
 
Big data and hadoop training - Session 5
Big data and hadoop training - Session 5Big data and hadoop training - Session 5
Big data and hadoop training - Session 5
 
2017 meetup-apache-kafka-nov
2017 meetup-apache-kafka-nov2017 meetup-apache-kafka-nov
2017 meetup-apache-kafka-nov
 
Apache Tuscany 2.x Extensibility And SPIs
Apache Tuscany 2.x Extensibility And SPIsApache Tuscany 2.x Extensibility And SPIs
Apache Tuscany 2.x Extensibility And SPIs
 

Mais de Joe Stein

Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
Joe Stein
 

Mais de Joe Stein (20)

Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache Mesos
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Storing Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsStoring Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite Columns
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Real-time streaming and data pipelines with Apache Kafka

  • 1. REAL-TIME STREAMING AND DATA PIPELINES WITH APACHE KAFKA
  • 3. Real-time Streaming with Data Pipelines Sync Z-Y < 20ms Y-X < 20ms Async Z-Y < 1ms Y-X < 1ms
  • 4. Real-time Streaming with Data Pipelines Sync Y-X < 20ms Async Y-X < 1ms
  • 5. Real-time Streaming with Data Pipelines
  • 6. Before we get started  Samples  https://github.com/stealthly/scala-kafka  Apache Kafka 0.8.0 Source Code  https://github.com/apache/kafka  Docs  http://kafka.apache.org/documentation.html  Wiki  https://cwiki.apache.org/confluence/display/KAFKA/Index  Presentations  https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations Tools  https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools  https://github.com/apache/kafka/tree/0.8/core/src/main/scala/kafka/tools
  • 7. Apache Kafka  Fast - A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients.  Scalable - Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers  Durable - Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact.  Distributed by Design - Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
  • 8. Up and running with Apache Kafka 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/scala-kafka 4) cd scala-kafka 5) vagrant up Zookeeper will be running on 192.168.86.5 BrokerOne will be running on 192.168.86.10 All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm) 6) ./sbt test [success] Total time: 37 s, completed Dec 19, 2013 11:21:13 AM
  • 10. SBT "org.apache.kafka" % "kafka_2.10" % "0.8.0“ intransitive() Or libraryDependencies ++= Seq( "org.apache.kafka" % "kafka_2.10" % "0.8.0", ) https://issues.apache.org/jira/browse/KAFKA-1160 https://github.com/apache/kafka/blob/0.8/project/Build.scala?source=c
  • 11. Apache Kafka Producers Consumers Brokers  Topics  Brokers  Sync Producers  Async Producers (Async/Sync)=> Akka Producers  Topics  Partitions  Read from the start  Read from where last left off  Partitions  Load Balancing Producers  Load Balancing Consumer  In Sync Replicaset (ISR)  Client API
  • 12. Producers  Topics  Brokers  Sync Producers  Async Producers  (Async/Sync)=> Akka Producers
  • 13. Producer /** at least one of these for every partition **/ val producer = new KafkaProducer(“topic","192.168.86.10:9092") producer.sendMessage(testMessage)
  • 14. Kafka Producer case class KafkaProducer( topic: String, brokerList: String, synchronously: Boolean = true, compress: Boolean = true, batchSize: Integer = 200, messageSendMaxRetries: Integer = 3 ){ val props = new Properties() val codec = if(compress) DefaultCompressionCodec.codec else NoCompressionCodec.codec props.put("compression.codec", codec.toString) props.put("producer.type", if(synchronously) "sync" else "async") props.put("metadata.broker.list", brokerList) props.put("batch.num.messages", batchSize.toString) props.put("message.send.max.retries", messageSendMaxRetries.toString) wrapper for core/src/main/kafka/producer/Producer.Scala val producer = new Producer[AnyRef, AnyRef](new ProducerConfig(props)) def sendMessage(message: String) = { try { producer.send(new KeyedMessage(topic,message.getBytes)) } catch { case e: Exception => e.printStackTrace System.exit(1) } } }
  • 16. class KafkaAkkaProducer extends Actor with Logging { private val producerCreated = new AtomicBoolean(false) var producer: KafkaProducer = null def init(topic: String, zklist: String) = { if (!producerCreated.getAndSet(true)) { producer = new KafkaProducer(topic,zklist) } Akka Producer def receive = { case t: (String, String) ⇒ { init(t._1, t._2) } case msg: String ⇒ { producer.sendString(msg) } } }
  • 17. Consumers Topics Partitions Read from the start Read from where last left off
  • 18. class KafkaConsumer( topic: String, groupId: String, zookeeperConnect: String, readFromStartOfStream: Boolean = true ) extends Logging { val props = new Properties() props.put("group.id", groupId) props.put("zookeeper.connect", zookeeperConnect) props.put("auto.offset.reset", if(readFromStartOfStream) "smallest" else "largest") val config = new ConsumerConfig(props) val connector = Consumer.create(config) val filterSpec = new Whitelist(topic) val stream = connector.createMessageStreamsByFilter(filterSpec, 1, new DefaultDecoder(), new DefaultDecoder()).get(0) def read(write: (Array[Byte])=>Unit) = { for(messageAndTopic <- stream) { write(messageAndTopic.message) } } def close() { connector.shutdown() } }
  • 20. val topic = “publisher” val consumer = new KafkaConsumer(topic ,”loop”,"192.168.86.5:2181") val count = 2 val pdc = system.actorOf(Props[KafkaAkkaProducer].withRouter(RoundRobinRouter(count)), "router") 1 to countforeach { i =>( pdc ! (topic,"192.168.86.10:9092")) } def exec(binaryObject: Array[Byte]) = { val message = new String(binaryObject) pdc ! message } pdc ! “go, go gadget stream 1” pdc ! “go, go gadget stream 2” consumer.read(exec)
  • 21. Brokers  Partitions  Load Balancing Producers  Load Balancing Consumer  In Sync Replicaset (ISR)  Client API
  • 23. Load Balancing Producer By Partition
  • 24. Load Balancing Consumer By Partition
  • 25. Replication – In Sync Replicas
  • 26. Client API o Developers Guide https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol o Non-JVM Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients o Python o Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. oC o High performance C library with full protocol support o C++ o Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset. o Go (aka golang) o Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. o Ruby o Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI 2. o Clojure o https://github.com/pingles/clj-kafka/ 0.8 branch
  • 27. Questions? /*********************************** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop ************************************/