SlideShare a Scribd company logo
1 of 27
https://github.com/stealthly/go_kafka_client
Joe Stein
• Developer, Architect & Technologist
• Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly
Big Data Open Source Security LLC provides professional services and product solutions for the collection,
storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and
distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data
Infrastructure Components to use but also how to change their existing (or build new) systems to work with
them.
• Apache Kafka Committer & PMC member
• Blog & Podcast - http://allthingshadoop.com
• Twitter @allthingshadoop
Overview
● Just enough Kafka
● Just enough Go
● Why a new Go Kafka Client
● Producers in Go
o Code
o Syslog
o MirrorMaker
● Consumers in Go
o Work management
o Partition ownership
o Blue / Green Deploys
o Offset management
● Distributed Reactive Streams
Apache Kafka
• Apache Kafka
o http://kafka.apache.org
• Apache Kafka Source Code
o https://github.com/apache/kafka
• Documentation
o http://kafka.apache.org/documentation.html
• FAQ
o https://cwiki.apache.org/confluence/display/KAFKA/FAQ
• Wiki
o https://cwiki.apache.org/confluence/display/KAFKA/Index
Kafka is a distributed, partitioned, replicated commit log service. It provides
the functionality of a messaging system, but with a unique design.
I heart logs
http://shop.oreilly.com/product/0636920034339.do?sortby=publicationDate
Really Quick Start (Go)
1) Install Vagrant http://www.vagrantup.com/
2) Install Virtual Box https://www.virtualbox.org/
3) git clone https://github.com/stealthly/go-kafka
4) cd go-kafka
5) vagrant up
6) vagrant ssh brokerOne
7) cd /vagrant
8) sudo ./test.sh
Just enough Go
● Produces statically linked native binaries without external
dependencies
● Built in package management with source commit definition
● Share memory by communicating
● Code to read to start learning go
http://www.somethingsimilar.com/2013/12/27/code-to-read-
when-learning-go/
Why another Go client?
● Go Kafka Client https://github.com/stealthly/go_kafka_client
wraps Sarama https://github.com/Shopify/sarama
● More real world use for producers
● High level consumer (lots more on this in a few minutes)
● The option to maybe wrap librdkafka
https://github.com/edenhill/librdkafka the high perforamcen
c/c++ library in the future
Go Kafka Client - Getting started
Prerequisites:
1. Install Golang http://golang.org/doc/install
2. Make sure env variables GOPATH and GOROOT exist and point to correct places
3. Install GPM https://github.com/pote/gpm
4. go get github.com/stealthly/go_kafka_client && cd $GOPATH/src/github.com/stealthly/go_kafka_client
5. gpm install
Optional (for all tests to work):
1. Install Docker https://docs.docker.com/installation/#installation
2. cd $GOPATH/src/github.com/stealthly/go_kafka_client
3. Build docker image: docker build -t stealthly/go_kafka_client .
Producers
client, err := sarama.NewClient(uuid.New(), []string{brokerConnect}, sarama.NewClientConfig())
if err != nil {
panic(err)
}
config := sarama.NewProducerConfig()
config.FlushMsgCount = flushMsgCount
config.FlushFrequency = flushFrequency
config.AckSuccesses = true
config.RequiredAcks = sarama.NoResponse //WaitForAll
config.MaxMessagesPerReq = maxMessagesPerReq
config.Timeout = 1000 * time.Millisecond
config.Compression = 2
producer, err := sarama.NewProducer(client, config)
if err != nil {
panic(err)
}
Producer Code
Magic Happens Here =8^)
go func() {
for {
message := &sarama.MessageToSend{Topic: topic, Key: sarama.StringEncoder(fmt.Sprintf("%d", numMessage)), Value:
sarama.StringEncoder(fmt.Sprintf("message %d!", numMessage))}
numMessage++
producer.Input() <- message
time.Sleep(sleepTime)
}
}()
go func() {
for {
select {
case error := <-producer.Errors():
saramaError <- error
case success := <-producer.Successes():
saramaSuccess <- success
}
}
}()
https://github.com/stealthly/go_kafka_client/tree/master/syslog
docker run -p --net=host stealthly/syslog --topic syslog --broker.list host:port
--producer.config - property file to configure embedded producers. This parameter is optional.
--tcp.port - TCP port to listen for incoming syslog messages. Defaults to 5140.
--tcp.host - TCP host to listen for incoming syslog messages. Defaults to 0.0.0.0.
--udp.port - UDP port to listen for incoming syslog messages. Defaults to 5141.
--udp.host - UDP host to listen for incoming syslog messages. Defaults to 0.0.0.0.
--num.producers - number of producer instances. This can be used to increase throughput. Defaults to 1
--queue.size - number of messages that are buffered for producing. Defaults to 10000.
--log.level - log level for built-in logger. Possible values are: trace, debug, info, warn, error, critical.Defaults to info.
--max.procs - maximum number of CPUs that can be executing simultaneously. Defaults to runtime.NumCPU().
Syslog Producer
Syslog Producer + Metadata
option to transform via --source “anything” --tag key1=value1 --tag key2=value2 --log.type 3
package syslog.proto;
message LogLine
{
message Tag
{
required string key = 1;
required string value = 2;
}
required string line = 1;
optional string source = 2 [default = ""];
repeated Tag tag = 3;
optional int64 logtypeid = 4 [default = 0];
repeated int64 timings = 5;
}
MirrorMaker
https://github.com/stealthly/go_kafka_client/tree/master/mirrormaker
● Preserve order
● Preserve partition number
● Prefix destination topic (e.g. dc1_) so you know where it
came from and avoid collision
● Everything else you expect
Consumer
Work management
● Fan out
● Sequential processing guarantees
● Ack communication from work to retry
● Failure “dead letter” so work can continue
Partition Ownership
● Not just “re-balance”
● Consistent single known state
● Strategies (plural)
o Round Robin
o Range
o More to come! e.g. manual assignment
Blue / Green Deployments
Jim Plush & Sean Berry from CrowdStrike=> https://www.youtube.com/watch?v=abK2Q_aecxY
Offset Management / Bookkeeping
● At least once processing guarantee
● Plugable
● Per partition batch commit
● Exactly once processing, happens after work
is done
● Always commits highest offset
Consumer Example
https://github.com/stealthly/go_kafka_client/blob/master/consumers/consumers.go
func main() {
config, topic, numConsumers, graphiteConnect, graphiteFlushInterval := resolveConfig()
consumers := make([]*kafkaClient.Consumer, numConsumers)
for i := 0; i < numConsumers; i++ {
consumers[i] = startNewConsumer(*config, topic)
}
}
func startNewConsumer(config kafkaClient.ConsumerConfig, topic string) *kafkaClient.Consumer {
config.Strategy = GetStrategy(config.Consumerid)
config.WorkerFailureCallback = FailedCallback
config.WorkerFailedAttemptCallback = FailedAttemptCallback
consumer := kafkaClient.NewConsumer(&config)
topics := map[string]int {topic : config.NumConsumerFetchers}
go func() {
consumer.StartStatic(topics)
}()
return consumer
}
Consumer Example
func GetStrategy(consumerId string) func(*kafkaClient.Worker, *kafkaClient.Message, kafkaClient.TaskId)
kafkaClient.WorkerResult {
consumeRate := metrics.NewRegisteredMeter(fmt.Sprintf("%s-ConsumeRate", consumerId), metrics.DefaultRegistry)
return func(_ *kafkaClient.Worker, msg *kafkaClient.Message, id kafkaClient.TaskId) kafkaClient.WorkerResult {
kafkaClient.Tracef("main", "Got a message: %s", string(msg.Value))
consumeRate.Mark(1)
return kafkaClient.NewSuccessfulResult(id)
}
}
func FailedCallback(wm *kafkaClient.WorkerManager) kafkaClient.FailedDecision {
kafkaClient.Info("main", "Failed callback")
return kafkaClient.DoNotCommitOffsetAndStop
}
func FailedAttemptCallback(task *kafkaClient.Task, result kafkaClient.WorkerResult) kafkaClient.FailedDecision {
kafkaClient.Info("main", "Failed attempt")
return kafkaClient.CommitOffsetAndContinue
}
Distributed Reactive Streams
Distributed Reactive Streams
Questions?
/*******************************************
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop
********************************************/

More Related Content

What's hot

Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
DataWorks Summit
 

What's hot (20)

Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Gitlab ci-cd
Gitlab ci-cdGitlab ci-cd
Gitlab ci-cd
 
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
 
Beautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDBBeautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDB
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on Kubernetes
 
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Introducing GitLab (September 2018)
Introducing GitLab (September 2018)Introducing GitLab (September 2018)
Introducing GitLab (September 2018)
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Introduction git
Introduction gitIntroduction git
Introduction git
 
RabbitMQ & Kafka
RabbitMQ & KafkaRabbitMQ & Kafka
RabbitMQ & Kafka
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector?
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 

Similar to Developing with the Go client for Apache Kafka

Similar to Developing with the Go client for Apache Kafka (20)

R sharing 101
R sharing 101R sharing 101
R sharing 101
 
Golang 101 for IT-Pros - Cisco Live Orlando 2018 - DEVNET-1808
Golang 101 for IT-Pros - Cisco Live Orlando 2018 - DEVNET-1808Golang 101 for IT-Pros - Cisco Live Orlando 2018 - DEVNET-1808
Golang 101 for IT-Pros - Cisco Live Orlando 2018 - DEVNET-1808
 
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...
 
Год в Github bugbounty, опыт участия
Год в Github bugbounty, опыт участияГод в Github bugbounty, опыт участия
Год в Github bugbounty, опыт участия
 
Beyond Puppet
Beyond PuppetBeyond Puppet
Beyond Puppet
 
What's New and Newer in Apache httpd-24
What's New and Newer in Apache httpd-24What's New and Newer in Apache httpd-24
What's New and Newer in Apache httpd-24
 
Practical Operation Automation with StackStorm
Practical Operation Automation with StackStormPractical Operation Automation with StackStorm
Practical Operation Automation with StackStorm
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
 
ContainerDayVietnam2016: Django Development with Docker
ContainerDayVietnam2016: Django Development with DockerContainerDayVietnam2016: Django Development with Docker
ContainerDayVietnam2016: Django Development with Docker
 
Us 17-krug-hacking-severless-runtimes
Us 17-krug-hacking-severless-runtimesUs 17-krug-hacking-severless-runtimes
Us 17-krug-hacking-severless-runtimes
 
Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014
 
FOSDEM 2017: GitLab CI
FOSDEM 2017:  GitLab CIFOSDEM 2017:  GitLab CI
FOSDEM 2017: GitLab CI
 
Puppet Camp NYC 2014: Safely storing secrets and credentials in Git for use b...
Puppet Camp NYC 2014: Safely storing secrets and credentials in Git for use b...Puppet Camp NYC 2014: Safely storing secrets and credentials in Git for use b...
Puppet Camp NYC 2014: Safely storing secrets and credentials in Git for use b...
 
Write microservice in golang
Write microservice in golangWrite microservice in golang
Write microservice in golang
 
Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)
 
Cloud Platform Symantec Meetup Nov 2014
Cloud Platform Symantec Meetup Nov 2014Cloud Platform Symantec Meetup Nov 2014
Cloud Platform Symantec Meetup Nov 2014
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Integrating Existing C++ Libraries into PySpark with Esther Kundin
Integrating Existing C++ Libraries into PySpark with Esther KundinIntegrating Existing C++ Libraries into PySpark with Esther Kundin
Integrating Existing C++ Libraries into PySpark with Esther Kundin
 

More from Joe Stein

Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 

More from Joe Stein (20)

Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache Mesos
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Storing Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsStoring Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite Columns
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Developing with the Go client for Apache Kafka

  • 2. Joe Stein • Developer, Architect & Technologist • Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly Big Data Open Source Security LLC provides professional services and product solutions for the collection, storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data Infrastructure Components to use but also how to change their existing (or build new) systems to work with them. • Apache Kafka Committer & PMC member • Blog & Podcast - http://allthingshadoop.com • Twitter @allthingshadoop
  • 3. Overview ● Just enough Kafka ● Just enough Go ● Why a new Go Kafka Client ● Producers in Go o Code o Syslog o MirrorMaker ● Consumers in Go o Work management o Partition ownership o Blue / Green Deploys o Offset management ● Distributed Reactive Streams
  • 4. Apache Kafka • Apache Kafka o http://kafka.apache.org • Apache Kafka Source Code o https://github.com/apache/kafka • Documentation o http://kafka.apache.org/documentation.html • FAQ o https://cwiki.apache.org/confluence/display/KAFKA/FAQ • Wiki o https://cwiki.apache.org/confluence/display/KAFKA/Index
  • 5. Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.
  • 6.
  • 8. Really Quick Start (Go) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/go-kafka 4) cd go-kafka 5) vagrant up 6) vagrant ssh brokerOne 7) cd /vagrant 8) sudo ./test.sh
  • 9. Just enough Go ● Produces statically linked native binaries without external dependencies ● Built in package management with source commit definition ● Share memory by communicating ● Code to read to start learning go http://www.somethingsimilar.com/2013/12/27/code-to-read- when-learning-go/
  • 10. Why another Go client? ● Go Kafka Client https://github.com/stealthly/go_kafka_client wraps Sarama https://github.com/Shopify/sarama ● More real world use for producers ● High level consumer (lots more on this in a few minutes) ● The option to maybe wrap librdkafka https://github.com/edenhill/librdkafka the high perforamcen c/c++ library in the future
  • 11. Go Kafka Client - Getting started Prerequisites: 1. Install Golang http://golang.org/doc/install 2. Make sure env variables GOPATH and GOROOT exist and point to correct places 3. Install GPM https://github.com/pote/gpm 4. go get github.com/stealthly/go_kafka_client && cd $GOPATH/src/github.com/stealthly/go_kafka_client 5. gpm install Optional (for all tests to work): 1. Install Docker https://docs.docker.com/installation/#installation 2. cd $GOPATH/src/github.com/stealthly/go_kafka_client 3. Build docker image: docker build -t stealthly/go_kafka_client .
  • 13. client, err := sarama.NewClient(uuid.New(), []string{brokerConnect}, sarama.NewClientConfig()) if err != nil { panic(err) } config := sarama.NewProducerConfig() config.FlushMsgCount = flushMsgCount config.FlushFrequency = flushFrequency config.AckSuccesses = true config.RequiredAcks = sarama.NoResponse //WaitForAll config.MaxMessagesPerReq = maxMessagesPerReq config.Timeout = 1000 * time.Millisecond config.Compression = 2 producer, err := sarama.NewProducer(client, config) if err != nil { panic(err) } Producer Code
  • 14. Magic Happens Here =8^) go func() { for { message := &sarama.MessageToSend{Topic: topic, Key: sarama.StringEncoder(fmt.Sprintf("%d", numMessage)), Value: sarama.StringEncoder(fmt.Sprintf("message %d!", numMessage))} numMessage++ producer.Input() <- message time.Sleep(sleepTime) } }() go func() { for { select { case error := <-producer.Errors(): saramaError <- error case success := <-producer.Successes(): saramaSuccess <- success } } }()
  • 15. https://github.com/stealthly/go_kafka_client/tree/master/syslog docker run -p --net=host stealthly/syslog --topic syslog --broker.list host:port --producer.config - property file to configure embedded producers. This parameter is optional. --tcp.port - TCP port to listen for incoming syslog messages. Defaults to 5140. --tcp.host - TCP host to listen for incoming syslog messages. Defaults to 0.0.0.0. --udp.port - UDP port to listen for incoming syslog messages. Defaults to 5141. --udp.host - UDP host to listen for incoming syslog messages. Defaults to 0.0.0.0. --num.producers - number of producer instances. This can be used to increase throughput. Defaults to 1 --queue.size - number of messages that are buffered for producing. Defaults to 10000. --log.level - log level for built-in logger. Possible values are: trace, debug, info, warn, error, critical.Defaults to info. --max.procs - maximum number of CPUs that can be executing simultaneously. Defaults to runtime.NumCPU(). Syslog Producer
  • 16. Syslog Producer + Metadata option to transform via --source “anything” --tag key1=value1 --tag key2=value2 --log.type 3 package syslog.proto; message LogLine { message Tag { required string key = 1; required string value = 2; } required string line = 1; optional string source = 2 [default = ""]; repeated Tag tag = 3; optional int64 logtypeid = 4 [default = 0]; repeated int64 timings = 5; }
  • 17. MirrorMaker https://github.com/stealthly/go_kafka_client/tree/master/mirrormaker ● Preserve order ● Preserve partition number ● Prefix destination topic (e.g. dc1_) so you know where it came from and avoid collision ● Everything else you expect
  • 19. Work management ● Fan out ● Sequential processing guarantees ● Ack communication from work to retry ● Failure “dead letter” so work can continue
  • 20. Partition Ownership ● Not just “re-balance” ● Consistent single known state ● Strategies (plural) o Round Robin o Range o More to come! e.g. manual assignment
  • 21. Blue / Green Deployments Jim Plush & Sean Berry from CrowdStrike=> https://www.youtube.com/watch?v=abK2Q_aecxY
  • 22. Offset Management / Bookkeeping ● At least once processing guarantee ● Plugable ● Per partition batch commit ● Exactly once processing, happens after work is done ● Always commits highest offset
  • 23. Consumer Example https://github.com/stealthly/go_kafka_client/blob/master/consumers/consumers.go func main() { config, topic, numConsumers, graphiteConnect, graphiteFlushInterval := resolveConfig() consumers := make([]*kafkaClient.Consumer, numConsumers) for i := 0; i < numConsumers; i++ { consumers[i] = startNewConsumer(*config, topic) } } func startNewConsumer(config kafkaClient.ConsumerConfig, topic string) *kafkaClient.Consumer { config.Strategy = GetStrategy(config.Consumerid) config.WorkerFailureCallback = FailedCallback config.WorkerFailedAttemptCallback = FailedAttemptCallback consumer := kafkaClient.NewConsumer(&config) topics := map[string]int {topic : config.NumConsumerFetchers} go func() { consumer.StartStatic(topics) }() return consumer }
  • 24. Consumer Example func GetStrategy(consumerId string) func(*kafkaClient.Worker, *kafkaClient.Message, kafkaClient.TaskId) kafkaClient.WorkerResult { consumeRate := metrics.NewRegisteredMeter(fmt.Sprintf("%s-ConsumeRate", consumerId), metrics.DefaultRegistry) return func(_ *kafkaClient.Worker, msg *kafkaClient.Message, id kafkaClient.TaskId) kafkaClient.WorkerResult { kafkaClient.Tracef("main", "Got a message: %s", string(msg.Value)) consumeRate.Mark(1) return kafkaClient.NewSuccessfulResult(id) } } func FailedCallback(wm *kafkaClient.WorkerManager) kafkaClient.FailedDecision { kafkaClient.Info("main", "Failed callback") return kafkaClient.DoNotCommitOffsetAndStop } func FailedAttemptCallback(task *kafkaClient.Task, result kafkaClient.WorkerResult) kafkaClient.FailedDecision { kafkaClient.Info("main", "Failed attempt") return kafkaClient.CommitOffsetAndContinue }
  • 27. Questions? /******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop ********************************************/