Enviar pesquisa
Carregar
Apache kafka
•
11 gostaram
•
2,550 visualizações
Shravan (Sean) Pabba
Seguir
Apache Kafka Deck used at NJ Hadoop meetup session on 8/11/2015
Leia menos
Leia mais
Software
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 39
Recomendados
A visual introduction to Apache Kafka
A visual introduction to Apache Kafka
Paul Brebner
Introduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
Apache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
Joel Koshy
Introduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
A Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
kafka
kafka
Amikam Snir
Apache kafka
Apache kafka
NexThoughts Technologies
Recomendados
A visual introduction to Apache Kafka
A visual introduction to Apache Kafka
Paul Brebner
Introduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
Apache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
Joel Koshy
Introduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
A Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
kafka
kafka
Amikam Snir
Apache kafka
Apache kafka
NexThoughts Technologies
Apache kafka
Apache kafka
Viswanath J
Kafka presentation
Kafka presentation
Mohammed Fazuluddin
Introduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
Fundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
Introduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
Securing Kafka
Securing Kafka
confluent
Kafka 101
Kafka 101
Clement Demonchy
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
confluent
Apache Kafka at LinkedIn
Apache Kafka at LinkedIn
Discover Pinterest
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
HostedbyConfluent
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
HostedbyConfluent
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
HostedbyConfluent
Apache Kafka
Apache Kafka
emreakis
Apache Kafka
Apache Kafka
Saroj Panyasrivanit
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
confluent
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
Stream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
Apache Flume - DataDayTexas
Apache Flume - DataDayTexas
Arvind Prabhakar
A Step Towards Reproducibility in R
A Step Towards Reproducibility in R
Revolution Analytics
Mais conteúdo relacionado
Mais procurados
Apache kafka
Apache kafka
Viswanath J
Kafka presentation
Kafka presentation
Mohammed Fazuluddin
Introduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
Fundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
Introduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
Securing Kafka
Securing Kafka
confluent
Kafka 101
Kafka 101
Clement Demonchy
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
confluent
Apache Kafka at LinkedIn
Apache Kafka at LinkedIn
Discover Pinterest
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
HostedbyConfluent
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
HostedbyConfluent
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
HostedbyConfluent
Apache Kafka
Apache Kafka
emreakis
Apache Kafka
Apache Kafka
Saroj Panyasrivanit
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
confluent
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
Stream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
Mais procurados
(20)
Apache kafka
Apache kafka
Kafka presentation
Kafka presentation
Introduction to Kafka Streams
Introduction to Kafka Streams
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Fundamentals of Apache Kafka
Fundamentals of Apache Kafka
Introduction to Apache Kafka
Introduction to Apache Kafka
Securing Kafka
Securing Kafka
Kafka 101
Kafka 101
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
Apache Kafka at LinkedIn
Apache Kafka at LinkedIn
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Apache Kafka
Apache Kafka
Apache Kafka
Apache Kafka
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Stream processing using Kafka
Stream processing using Kafka
Destaque
Apache Flume - DataDayTexas
Apache Flume - DataDayTexas
Arvind Prabhakar
A Step Towards Reproducibility in R
A Step Towards Reproducibility in R
Revolution Analytics
I Should Have Used Social Selling | Gil Gunderson's Guide To Social Sales
I Should Have Used Social Selling | Gil Gunderson's Guide To Social Sales
Gerry Moran
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
DataWorks Summit
Apache Flume
Apache Flume
Arinto Murdopo
Apache Flume (NG)
Apache Flume (NG)
Alexander Alten-Lorenz
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
Gwen (Chen) Shapira
'Flume' Case Study
'Flume' Case Study
PriyankaRadha
Apache flume
Apache flume
Ramakrishna kapa
Real-Time Fraud Detection with Storm and Kafka
Real-Time Fraud Detection with Storm and Kafka
Alexey Kharlamov
Kibana
Kibana
Torstein Hansen
HBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejp
FwardNetwork
Apache Kafka Security
Apache Kafka Security
DataWorks Summit/Hadoop Summit
Introduction To Kibana
Introduction To Kibana
Jen Stirrup
Inside Flume
Inside Flume
Cloudera, Inc.
Kafka and Spark Streaming
Kafka and Spark Streaming
datamantra
Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定
Yoshiyasu SAEKI
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
DataWorks Summit
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stack
Sylvain Wallez
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Oleksiy Panchenko
Destaque
(20)
Apache Flume - DataDayTexas
Apache Flume - DataDayTexas
A Step Towards Reproducibility in R
A Step Towards Reproducibility in R
I Should Have Used Social Selling | Gil Gunderson's Guide To Social Sales
I Should Have Used Social Selling | Gil Gunderson's Guide To Social Sales
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
Apache Flume
Apache Flume
Apache Flume (NG)
Apache Flume (NG)
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
'Flume' Case Study
'Flume' Case Study
Apache flume
Apache flume
Real-Time Fraud Detection with Storm and Kafka
Real-Time Fraud Detection with Storm and Kafka
Kibana
Kibana
HBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejp
Apache Kafka Security
Apache Kafka Security
Introduction To Kibana
Introduction To Kibana
Inside Flume
Inside Flume
Kafka and Spark Streaming
Kafka and Spark Streaming
Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stack
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Semelhante a Apache kafka
End to End Streaming Architectures
End to End Streaming Architectures
Cloudera, Inc.
Kafka for DBAs
Kafka for DBAs
Gwen (Chen) Shapira
intro-kafka
intro-kafka
Rahul Shukla
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
Grant Henke
Spark+flume seattle
Spark+flume seattle
Hari Shreedharan
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Data Con LA
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
huguk
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
Pat Patterson
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
Pat Patterson
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
Felicia Haggarty
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
Kathleen Ting
Kafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User Group
Jeff Holoman
Semelhante a Apache kafka
(20)
End to End Streaming Architectures
End to End Streaming Architectures
Kafka for DBAs
Kafka for DBAs
intro-kafka
intro-kafka
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
Spark+flume seattle
Spark+flume seattle
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
Kafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User Group
Último
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Cizo Technology Services
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
smiwainfosol
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
Hr365.us smith
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
Andreas Granig
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Natan Silnitsky
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
Velvetech LLC
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Mater
MYjobs Presentation Django-based project
MYjobs Presentation Django-based project
AnoyGreter
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
VICTOR MAESTRE RAMIREZ
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
Dinusha Kumarasiri
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
andrehoraa
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Drew Moseley
Software Coding for software engineering
Software Coding for software engineering
ssuserb3a23b
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
rcbcrtm
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
Hironori Washizaki
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
kalichargn70th171
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
Diego Iván Oliveros Acosta
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
Philip Schwarz
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Ahmed Mohamed
Último
(20)
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
MYjobs Presentation Django-based project
MYjobs Presentation Django-based project
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Software Coding for software engineering
Software Coding for software engineering
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Apache kafka
1.
1 © Cloudera,
Inc. All rights reserved. Apache Ka:a -‐ Inges<on and Processing Pipeline NJ Hadoop Meetup – 8/11/15 Shravan Pabba @skpabba
2.
2 © Cloudera,
Inc. All rights reserved. Agenda • Ka:a Concepts and Architecture • Ka:a vs Tradi<onal messaging systems • Ka:a with Cloudera • Demo § Install and configure Ka:a on Cloudera cluster § Client tools -‐ Add and consume data from topics § Replica<on and Failover capabili<es § Flume Integra<on and demo of Ka:a to Flume to HDFS • Other topics
3.
3 © Cloudera,
Inc. All rights reserved. About Me • Systems Engineer @ Cloudera • Previously Pre/Post Sales Architect @ GigaSpaces, IBM • Mainframes, Client/Server, Distributed & Cloud
4.
4 © Cloudera,
Inc. All rights reserved. Ka:a Concepts and Architecture
5.
5 © Cloudera,
Inc. All rights reserved. Cloudera Enterprise Data Hub Inges<on Typical Data Hub Architecture Cloudera Manager Ka:a Flume Spark Streaming DistCp Sqoop File Dumping Access Layer Interac<ve JDBC ODBC ETL Hive Spark DAG MLlib Girpah Grid Compute Custom Egress DistCp Producer File Dumping Ka:a/ Custom Custom HBase API SolR Engines Storage Layer HDFS HBase SolR Yarn Spark Map Reduce Impala Sentry (Security Framework) Encryp<on Navigator PIG
6.
6 © Cloudera,
Inc. All rights reserved. • No ability to replay events • Mul<ple sinks requires event replica<on (via mul<ple channels) • Sinks that share a source (mostly) process events in sync • This is !ght coupling Why Ka:a? (Or rather, why didn’t LinkedIn use Flume?) Spool Source Avro Sink Channel Spool Source Avro Sink Channel Avro Source HBase Sink Channel HDFS Sink HBase HDFS Logs More Logs Channel
7.
7 © Cloudera,
Inc. All rights reserved. Why Ka:a? Web logs Hadoop Connections = O(1) 2009
8.
8 © Cloudera,
Inc. All rights reserved. Why Ka:a? Increasing complexity Web logs Hadoop Connections = O(1) Connections = O(Systems2) Transactions Metrics Web logs Hadoop Warehouse Alerting Audit Logs Security 2009 2014
9.
9 © Cloudera,
Inc. All rights reserved. Why Ka:a? Decoupling Connections = O(Systems2) Transactions Metrics Web logs Hadoop Warehouse Alerting Audit Logs Security Transactions Metrics Web logs Hadoop Warehouse Alerting Audit Logs Security Connections = O(Systems) Kafka 2014 2015+?
10.
10 © Cloudera,
Inc. All rights reserved. • Distributed, structured logs are very useful • Resiliency / replica<on • Database write-‐ahead logs (HBase WAL, Oracle Redo-‐logs, etc) • System decoupling • Enterprise service buses (ESBs) • Data integra<on (change data capture) • Stream processing (e.g. real-‐<me alerts) • Consensus (using logical clocks) Why Ka:a? Because logs.
11.
11 © Cloudera,
Inc. All rights reserved. What is Ka:a? • Ka:a is … Transactions Metrics Web logs Hadoop Warehouse Alerting Audit Logs Security Kafka
12.
12 © Cloudera,
Inc. All rights reserved. What is Ka:a? • Ka:a is a distributed, … Transactions Metrics Web logs Hadoop Warehouse Alerting Audit Logs Security Broker Broker Broker Kafka
13.
13 © Cloudera,
Inc. All rights reserved. What is Ka:a? • Ka:a is a distributed, topic-‐oriented, … Source 1 Topic 1 Sink 1 Source 2 Source 3 Topic 2 Sink 2 Broker
14.
14 © Cloudera,
Inc. All rights reserved. What is Ka:a? • Ka:a is a distributed, topic-‐oriented, par00oned, … Source 1 Topic 1 Partition 1 Sink 1 Source 2 Source 3 Topic 2 Partition 1 Sink 2 Broker Topic 1 Partition 2 Topic 2 Partition 2 Broker
15.
15 © Cloudera,
Inc. All rights reserved. What is Ka:a? • Ka:a is a distributed, topic-‐oriented, par<<oned, replicated commit log. Source 1 Topic 1 Partition 1 Sink 1 Source 2 Source 3 Topic 2 Partition 1 Sink 2 Broker Topic 1 Partition 2 Topic 2 Partition 2 Broker Topic 1 Partition 2 Topic 2 Partition 2 Topic 1 Partition 1 Topic 2 Partition 1
16.
16 © Cloudera,
Inc. All rights reserved. What is Ka:a? • Ka:a is a distributed, topic-‐oriented, par<<oned, replicated commit log. • Ka:a is also pub-‐sub messaging system. • Messages can be text (e.g. syslog), but binary is best (preferably Avro!). Source 1 Topic 1 Partition 1 Sink 1 Source 2 Source 3 Topic 2 Partition 1 Sink 2 Broker Topic 1 Partition 2 Topic 2 Partition 2 Broker Topic 1 Partition 2 Topic 2 Partition 2 Topic 1 Partition 1 Topic 2 Partition 1
17.
17 © Cloudera,
Inc. All rights reserved. Architectural Overview • Each machine is called a Broker • Data wrilen belongs to Topics (analogous to a Table in a database) • Each Topic is par<<oned • Par<<ons are distributed across the Brokers • Par<<ons are also replicated (one replica per par<<on is Leader Par<<on) • Producers and Consumers talk to the Leader Par<<on Broker 1 Broker 2 Broker 3 Par<<on 1 (Leader) Par<<on 2 Par<<on 3 Par<<on 2 (Leader) Par<<on 1 Par<<on 3 Par<<on 3 (Leader) Par<<on 1 Par<<on 2 Producer Producer Consumer Consumer Ka:a Cluster
18.
18 © Cloudera,
Inc. All rights reserved. The Ka:a Advantage • One broker can handle 100MBs of reads/ writes per second, from 1000s clients • Messages delivered in milliseconds High-‐Throughput & Low Latency • Zero data loss with messages persisted on disk and replicated within the cluster • Highly-‐available with fault-‐tolerance built into the system. Durability & Reliability • Elas<cally and transparently add more machines without down<me for horizontal scalability • Dynamically add Producers & Consumers • Enable real-‐<me & batch consump<on Scalability & Flexibility • Modest cluster op<mized to handle millions of messages per second • Open standard for long-‐term value • With Cloudera, a single system for mul<ple workloads Cost-‐Efficient
19.
19 © Cloudera,
Inc. All rights reserved. How does it compare to Flume and Tradi<onal Messaging
20.
20 © Cloudera,
Inc. All rights reserved. Ka4a • Ka:a is very much a general-‐purpose system. Many producers and many consumers sharing mul<ple topics • Ka:a, has a significantly smaller producer and consumer ecosystem • Ka:a requires an external stream processing system for that • Highly Available ingest pipeline Flume • Flume is a special-‐purpose tool designed to send data to HDFS, HBase (and Solr) • Flume has many built-‐in sources and sinks • In-‐flight data processing using interceptors. Useful for data masking or filtering • Flume does not replicate events Ka:a Vs Flume
21.
21 © Cloudera,
Inc. All rights reserved. Random and Sequen<al Access in Disk and Memory Source: hlp://queue.acm.org/detail.cfm?id=1563874
22.
22 © Cloudera,
Inc. All rights reserved. Ka4a • Ka:a does only sequen<al file I/O • Ka:a keeps a single pointer into each par<<on of a topic. All messages prior to the pointer are considered consumed, and all messages auer it are consider unconsumed • Relies heavily on OS pagecache for data storage, zerocopy • No GC, No Memory overhead • Ka:a supports end-‐to-‐end batching and compression of messages Tradi0onal Messaging • Tradi<onal messaging does random file/memory I/O (BTree structures) • Typically messaging system keep some kind of per-‐message state about what has been consumed and have to update it • Disk/Memory is used for storage • JVM == GC and memory overhead • Tradi<onal messaging is typically as non-‐batch and un-‐compressed Why is Ka:a fast?
23.
23 © Cloudera,
Inc. All rights reserved. Canonical Use Cases • Real-‐Time Stream Processing • General-‐Purpose Message Bus • User Ac<vity Data Collec<on • Opera<onal Metrics Collec<on (applica<ons, servers, or devices) • Log Aggrega<on • Change Data Capture • Distributed Systems Commit Log
24.
24 © Cloudera,
Inc. All rights reserved. Ka:a and Cloudera
25.
25 © Cloudera,
Inc. All rights reserved. Simplified Management • Deploy and Configure Ka:a clusters • Unified Management • Mul<ple Ka:a clusters • En<re plavorm • Monitoring, Alerts, and Dashboards
26.
26 © Cloudera,
Inc. All rights reserved. Configure Ka:a using CM
27.
27 © Cloudera,
Inc. All rights reserved. CM has much more!
28.
28 © Cloudera,
Inc. All rights reserved. CM has much more!
29.
29 © Cloudera,
Inc. All rights reserved. CM has much more!
30.
30 © Cloudera,
Inc. All rights reserved. Ka:a + Apache Flume • Ka:a can be configured as a fast, reliable Flume Channel • Flume Sources and Sinks can be used as out-‐of-‐the-‐box Ka:a Producers and Consumers Flume Sinks Consume from Ka4a: Write data to HDFS, HBase, or Search Flume Sources Write to Ka4a: Read from logs, files, jms, hlp, rpc, thriu, etc and write events to Ka:a
31.
31 © Cloudera,
Inc. All rights reserved. Cloudera + Ka:a Community involvement and contribu0on: • Spearheading adding security features to Ka:a • Iden<fied and fixed core architectural issues to make Ka:a fully reliable • Strong rela<onship with the Confluent.io and other Ka:a Commilers Support exper0se and experience: • Mul<ple produc<on customers • Support team trained by Ka:a Commilers Integrated with Cloudera’s produc0on-‐ready plaForm: • Cloudera Manager CSD makes it easy to deploy, configure, and monitor Ka:a clusters • End-‐to-‐end workloads with other components, all on a single system • Leading security, governance, administra<on, and partner network
32.
32 © Cloudera,
Inc. All rights reserved. Roadmap Security: • Authen<ca<on with Kerberos • Topic level Authoriza<on • SSL encryp<on of data over-‐the-‐wire • Improved Cloudera Manager integra<on • HUE integra<on *Roadmap is subject to change
33.
33 © Cloudera,
Inc. All rights reserved. Demo
34.
34 © Cloudera,
Inc. All rights reserved. Ka:a Demo • Install and configure Ka:a on Cloudera cluster • Client tools -‐ Add and consume data from topics • Replica<on and Failover capabili<es • Flume Integra<on and demo of Ka:a to Flume to HDFS
35.
35 © Cloudera,
Inc. All rights reserved. Other Topics
36.
36 © Cloudera,
Inc. All rights reserved. Clients/API’s • Java, Python, Go, C/C++, .Net, Clojure, Ruby, Erlang, stdin/stdout and more here, hlps://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-‐ ProducerDaemon • Producer and Consumer API • New Java Producer API was in 0.8.2 • New consumer API is coming in next release
37.
37 © Cloudera,
Inc. All rights reserved. Mirror Maker • Mul< Ka:a Cluster replica<on, HA Across datacenters
38.
38 © Cloudera,
Inc. All rights reserved. Camus/Samza/Ka:a Manager • Camus/Samza are tools used and created in LinkedIn • Camus is a client for inges<ng Ka:a data into Hadoop (MR jobs under the covers) • Camus being phased out and replaced with Gobblin • Samza is stream processing framework that uses Ka:a for messaging and YARN for processing (resource management etc) • Management tool for Ka:a develop @ Yahoo
39.
39 © Cloudera,
Inc. All rights reserved. Thank You