SlideShare uma empresa Scribd logo
1 de 37
Jay Kreps
Introduction to Apache Kafka
The Plan
1. What is Apache Kafka?
2. Kafka and Data Integration
3. Kafka and Stream Processing
Apache Kafka
A
brief
history
of
Apache
Kafka
Characteristics
• Scalability of a filesystem
– Hundreds of MB/sec/server throughput
– Many TB per server
• Guarantees of a database
– Messages strictly ordered
– All data persistent
• Distributed by default
– Replication
– Partitioning model
Kafka is about logs
What is a log?
Logs: pub/sub done right
Partitioning
Nodes Host Many Partitions
Producers Balance Load
Consumer’s Divide Up
Partitions
End-to-End
Kafka At LinkedIn
• 175 TB of in-flight log data per colo
• Replicated to each datacenter
• Tens of thousands of data producers
• Thousands of consumers
• 7 million messages written/sec
• 35 million messages read/sec
• Hadoop integration
Performance
• Producer (3x replication):
– Async: 786,980 records/sec (75.1 MB/sec)
– Sync: 421,823 records/sec (40.2 MB/sec)
• Consumer:
– 940,521 records/sec (89.7 MB/sec)
• End-to-end latency:
– 2 ms (median)
– 14 ms (99.9th percentile)
The Plan
1. What is Apache Kafka?
2. Kafka and Data Integration
3. Kafka and Stream Processing
Data Integration
Maslow’s Hierarchy
For Data
New Types of Data
• Database data
– Users, products, orders, etc
• Events
– Clicks, Impressions, Pageviews, etc
• Application metrics
– CPU usage, requests/sec
• Application logs
– Service calls, errors
New Types of Systems
• Live Stores
– Voldemort
– Espresso
– Graph
– OLAP
– Search
– InGraphs
• Offline
– Hadoop
– Teradata
Bad
Good
Example: User views job
Comparing Data Transfer
Mechanisms
The Plan
1. What is Apache Kafka?
2. Kafka and Data Integration
3. Kafka and Stream Processing
Stream Processing
Stream processing is a
generalization
of batch processing
Stream Processing = Logs + Jobs
Examples
• Monitoring
• Security
• Content processing
• Recommendations
• Newsfeed
• ETL
Frameworks Can Help
Samza Architecture
Log-centric Architecture
Kafka
http://kafka.apache.org
Samza
http://samza.incubator.apache.org
Log Blog
http://linkd.in/199iMwY
Benchmark:
http://t.co/40fkKJvanx
Me
http://www.linkedin.com/in/jaykreps
@jaykreps

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 

Semelhante a Apache Kafka at LinkedIn

Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 
Kafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueKafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message Queue
Shafaq Abdullah
 

Semelhante a Apache Kafka at LinkedIn (20)

I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Kafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueKafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message Queue
 
F_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxF_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptx
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Lessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesLessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and Microservices
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 

Último

Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 

Último (20)

S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 

Apache Kafka at LinkedIn

Notas do Editor

  1. Who are you? What is this talk about? Exciting topic More
  2. Messaging system, like JMS (but different!) Producers, consumers distributed
  3. Start with state at LinkedIn, describe each pipeline 1 Pipeline for database data 1 Pipeline for metrics 1 Pipeline for events 1 JMS-based pipeline No pipeline for application logs 300 ActiveMQ brokers
  4. 10,000 messages/sec * 100 byte messages = ~1MB/sec
  5. The log is fundamental abstraction Kafka provides You can use a log as a drop-in replacement for a messaging system, but it can also do a lot more
  6. What is a log? Traditional uses? Non-traditional uses…
  7. Time ordered Semi-structured
  8. Data structure not a text file List of changes Contents of record doesn’t matter Indexed by “time” Not application log (i.e. text file)
  9. Remotely accessible State machine replication
  10. Data model of Kafka: A topic Partitions can be spread over machines, replicated
  11. Path of a write Leadership failover Guarantees
  12. AKA ETL Many systems Event data Most important problem for data-centric companies Integration >> ML
  13. Maslow’s Hiearchy Abraham Maslow, Physchologist, 1943 Physiological – eat, drink, sleep Safety – Not being attacked Love/Belonging – friends and family Esteem – respect of others Self-Actualization – morality, creativity, spontenaity
  14. Want to do Deep Learning Instead finding that their CSV data ALSO has commas in it Copying files around Ugh The Caveman Data Warehousing has a bad reputation
  15. Two exacerbating factors 15 years ago, just the first one (transactional data) New categories are very high volume, maybe 100x the transactional data Look like events Internet of things
  16. One-size fits all
  17. Tell story: Started with Hadoop, added arrows to get data there Want to build fancy algorithms, need data (expectation 90% of time for fancy, 10% for data) Holy shit this is hard! Data is missing, data is late, computation runs on wrong data Hadoop without good data is just a very expensive space heater Never get to full connectivity
  18. Metcalfe’s law Each new system connects to get/give data All data in multi-subscriber, real-time logs The company is a big distributed system The data center is the distributed system
  19. Three dims: Throughput Guarantees Latency Advantages over messaging: Huge data backlog Order Advantages over files Real-time Advantage over both: principled notion of time
  20. Whole organization is big distributed system Commit log = data transfer Stream processing = triggers Batch is dominant paradigm for data processing, why?
  21. Service: One input = one output Batch job: All inputs = all outputs Stream computing: any window = output for that window
  22. No different from batch processing flow (instead of files/tables, logs)
  23. Storm and Samza About process management – both integrate with Kafka MapReduce and HDFS