SlideShare uma empresa Scribd logo
1 de 39
1	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Apache	
  Ka:a	
  -­‐	
  Inges<on	
  and	
  
Processing	
  Pipeline	
  
NJ	
  Hadoop	
  Meetup	
  –	
  8/11/15	
  
Shravan	
  Pabba	
  @skpabba	
  
2	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Agenda	
  
•  Ka:a	
  Concepts	
  and	
  Architecture	
  
•  Ka:a	
  vs	
  Tradi<onal	
  messaging	
  systems	
  
•  Ka:a	
  with	
  Cloudera	
  
•  Demo	
  
§ Install	
  and	
  configure	
  Ka:a	
  on	
  Cloudera	
  cluster	
  
§ Client	
  tools	
  -­‐	
  Add	
  and	
  consume	
  data	
  from	
  topics	
  
§ Replica<on	
  and	
  Failover	
  capabili<es	
  
§ Flume	
  Integra<on	
  and	
  demo	
  of	
  Ka:a	
  to	
  Flume	
  to	
  HDFS	
  
•  Other	
  topics	
  
3	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
About	
  Me	
  
•  Systems	
  Engineer	
  @	
  Cloudera	
  
•  Previously	
  Pre/Post	
  Sales	
  Architect	
  @	
  GigaSpaces,	
  IBM	
  
•  Mainframes,	
  Client/Server,	
  Distributed	
  &	
  Cloud	
  
4	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ka:a	
  Concepts	
  and	
  Architecture	
  
5	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Cloudera	
  Enterprise	
  Data	
  Hub	
  
Inges<on	
  
Typical	
  Data	
  Hub	
  Architecture	
  
Cloudera	
  Manager	
  
Ka:a	
  
Flume	
  
Spark	
  Streaming	
  
DistCp	
  
Sqoop	
  
File	
  Dumping	
  
Access	
  Layer	
  
Interac<ve	
  
JDBC	
  
ODBC	
  
ETL	
  
Hive	
  
Spark	
  DAG	
  
MLlib	
  
Girpah	
  
Grid	
  
Compute	
  
Custom	
  
Egress	
  
DistCp	
  
Producer	
  
File	
  
Dumping	
  
Ka:a/
Custom	
  
Custom	
   HBase	
  API	
  
SolR	
  
Engines	
  Storage	
  Layer	
  
HDFS	
   HBase	
   SolR	
  
Yarn	
  
Spark	
   Map	
  Reduce	
  Impala	
  
Sentry	
  (Security	
  Framework)	
  
Encryp<on	
  
Navigator	
  
PIG	
  
6	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
•  No	
  ability	
  to	
  replay	
  events	
  
•  Mul<ple	
  sinks	
  requires	
  event	
  replica<on	
  (via	
  mul<ple	
  channels)	
  
•  Sinks	
  that	
  share	
  a	
  source	
  (mostly)	
  process	
  events	
  in	
  sync	
  
•  This	
  is	
  !ght	
  coupling	
  
Why	
  Ka:a?	
  (Or	
  rather,	
  why	
  didn’t	
  LinkedIn	
  use	
  Flume?)	
  
Spool
Source
Avro
Sink
Channel
Spool
Source
Avro
Sink
Channel
Avro
Source
HBase
Sink
Channel
HDFS
Sink
HBase
HDFS
Logs
More
Logs
Channel
7	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Why	
  Ka:a?	
  
Web logs Hadoop
Connections = O(1)
2009	
  
8	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Why	
  Ka:a?	
  Increasing	
  complexity	
  
Web logs Hadoop
Connections = O(1)
Connections = O(Systems2)
Transactions
Metrics
Web logs Hadoop
Warehouse
Alerting
Audit Logs Security
2009	
   2014	
  
9	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Why	
  Ka:a?	
  Decoupling	
  
Connections = O(Systems2)
Transactions
Metrics
Web logs Hadoop
Warehouse
Alerting
Audit Logs Security
Transactions
Metrics
Web logs Hadoop
Warehouse
Alerting
Audit Logs Security
Connections = O(Systems)
Kafka
2014	
   2015+?	
  
10	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
• Distributed,	
  structured	
  logs	
  are	
  very	
  useful	
  
• Resiliency	
  /	
  replica<on	
  
•  Database	
  write-­‐ahead	
  logs	
  (HBase	
  WAL,	
  Oracle	
  Redo-­‐logs,	
  etc)	
  
• System	
  decoupling	
  
•  Enterprise	
  service	
  buses	
  (ESBs)	
  
•  Data	
  integra<on	
  (change	
  data	
  capture)	
  
• Stream	
  processing	
  (e.g.	
  real-­‐<me	
  alerts)	
  
• Consensus	
  (using	
  logical	
  clocks)	
  
Why	
  Ka:a?	
  Because	
  logs.	
  
11	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
What	
  is	
  Ka:a?	
  
•  Ka:a	
  is	
  …	
  
Transactions
Metrics
Web logs Hadoop
Warehouse
Alerting
Audit Logs Security
Kafka
12	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
What	
  is	
  Ka:a?	
  
•  Ka:a	
  is	
  a	
  distributed,	
  …	
  
Transactions
Metrics
Web logs Hadoop
Warehouse
Alerting
Audit Logs Security
Broker
Broker
Broker
Kafka
13	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
What	
  is	
  Ka:a?	
  
•  Ka:a	
  is	
  a	
  distributed,	
  topic-­‐oriented,	
  
…	
  
Source 1
Topic 1 Sink 1
Source 2
Source 3
Topic 2 Sink 2
Broker
14	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
What	
  is	
  Ka:a?	
  
•  Ka:a	
  is	
  a	
  distributed,	
  topic-­‐oriented,	
  
par00oned,	
  …	
  
Source 1
Topic 1
Partition 1
Sink 1
Source 2
Source 3
Topic 2
Partition 1
Sink 2
Broker
Topic 1
Partition 2
Topic 2
Partition 2
Broker
15	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
What	
  is	
  Ka:a?	
  
•  Ka:a	
  is	
  a	
  distributed,	
  topic-­‐oriented,	
  
par<<oned,	
  replicated	
  commit	
  log.	
  
Source 1
Topic 1
Partition 1
Sink 1
Source 2
Source 3
Topic 2
Partition 1
Sink 2
Broker
Topic 1
Partition 2
Topic 2
Partition 2
Broker
Topic 1
Partition 2
Topic 2
Partition 2
Topic 1
Partition 1
Topic 2
Partition 1
16	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
What	
  is	
  Ka:a?	
  
•  Ka:a	
  is	
  a	
  distributed,	
  topic-­‐oriented,	
  
par<<oned,	
  replicated	
  commit	
  log.	
  
•  Ka:a	
  is	
  also	
  pub-­‐sub	
  messaging	
  
system.	
  
•  Messages	
  can	
  be	
  text	
  (e.g.	
  syslog),	
  but	
  
binary	
  is	
  best	
  (preferably	
  Avro!).	
  
Source 1
Topic 1
Partition 1
Sink 1
Source 2
Source 3
Topic 2
Partition 1
Sink 2
Broker
Topic 1
Partition 2
Topic 2
Partition 2
Broker
Topic 1
Partition 2
Topic 2
Partition 2
Topic 1
Partition 1
Topic 2
Partition 1
17	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Architectural	
  Overview	
  
•  Each	
  machine	
  is	
  called	
  a	
  Broker	
  
•  Data	
  wrilen	
  belongs	
  to	
  Topics	
  
(analogous	
  to	
  a	
  Table	
  in	
  a	
  database)	
  
•  Each	
  Topic	
  is	
  par<<oned	
  
•  Par<<ons	
  are	
  distributed	
  across	
  the	
  
Brokers	
  	
  
•  Par<<ons	
  are	
  also	
  replicated	
  (one	
  
replica	
  per	
  par<<on	
  is	
  Leader	
  Par<<on)	
  	
  
•  Producers	
  and	
  Consumers	
  talk	
  to	
  the	
  
Leader	
  Par<<on	
  
Broker	
  1	
   Broker	
  2	
   Broker	
  3	
  
Par<<on	
  1	
  
(Leader)	
  
Par<<on	
  2	
  
Par<<on	
  3	
  
Par<<on	
  2	
  
(Leader)	
  
Par<<on	
  1	
  
Par<<on	
  3	
  
Par<<on	
  3	
  
(Leader)	
  
Par<<on	
  1	
  
Par<<on	
  2	
  
Producer	
   Producer	
  
Consumer	
  Consumer	
  
Ka:a	
  Cluster	
  
18	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
The	
  Ka:a	
  Advantage	
  
	
  
•  One	
  broker	
  can	
  handle	
  100MBs	
  of	
  reads/
writes	
  per	
  second,	
  from	
  1000s	
  clients	
  
	
  
•  Messages	
  delivered	
  in	
  milliseconds	
  
High-­‐Throughput	
  &	
  Low	
  Latency	
  
•  Zero	
  data	
  loss	
  with	
  messages	
  persisted	
  on	
  
disk	
  and	
  replicated	
  within	
  the	
  cluster	
  
•  Highly-­‐available	
  with	
  fault-­‐tolerance	
  built	
  
into	
  the	
  system.	
  
Durability	
  &	
  Reliability	
  
•  Elas<cally	
  and	
  transparently	
  add	
  more	
  
machines	
  without	
  down<me	
  for	
  horizontal	
  
scalability	
  
•  Dynamically	
  add	
  Producers	
  &	
  Consumers	
  
•  Enable	
  real-­‐<me	
  &	
  batch	
  consump<on	
  
Scalability	
  &	
  Flexibility	
  
•  Modest	
  cluster	
  op<mized	
  to	
  handle	
  millions	
  
of	
  messages	
  per	
  second	
  
•  Open	
  standard	
  for	
  long-­‐term	
  value	
  
•  With	
  Cloudera,	
  a	
  single	
  system	
  for	
  mul<ple	
  
workloads	
  
Cost-­‐Efficient	
  
19	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
How	
  does	
  it	
  compare	
  to	
  Flume	
  and	
  Tradi<onal	
  
Messaging	
  
20	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ka4a	
  
•  Ka:a	
  is	
  very	
  much	
  a	
  general-­‐purpose	
  
system.	
  Many	
  producers	
  and	
  many	
  
consumers	
  sharing	
  mul<ple	
  topics	
  
•  Ka:a,	
  has	
  a	
  significantly	
  smaller	
  
producer	
  and	
  consumer	
  ecosystem	
  
•  Ka:a	
  requires	
  an	
  external	
  stream	
  
processing	
  system	
  for	
  that	
  
•  Highly	
  Available	
  ingest	
  pipeline	
  
Flume	
  
•  Flume	
  is	
  a	
  special-­‐purpose	
  tool	
  
designed	
  to	
  send	
  data	
  to	
  HDFS,	
  HBase	
  
(and	
  Solr)	
  
•  Flume	
  has	
  many	
  built-­‐in	
  sources	
  and	
  
sinks	
  
•  In-­‐flight	
  data	
  processing	
  using	
  
interceptors.	
  Useful	
  for	
  data	
  masking	
  
or	
  filtering	
  
•  Flume	
  does	
  not	
  replicate	
  events	
  
Ka:a	
  Vs	
  Flume	
  
21	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Random	
  and	
  Sequen<al	
  Access	
  in	
  Disk	
  and	
  Memory	
  
Source:	
  hlp://queue.acm.org/detail.cfm?id=1563874	
  
22	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ka4a	
  
•  Ka:a	
  does	
  only	
  sequen<al	
  file	
  I/O	
  
•  Ka:a	
  keeps	
  a	
  single	
  pointer	
  into	
  each	
  
par<<on	
  of	
  a	
  topic.	
  All	
  messages	
  prior	
  
to	
  the	
  pointer	
  are	
  considered	
  
consumed,	
  and	
  all	
  messages	
  auer	
  it	
  
are	
  consider	
  unconsumed	
  
•  Relies	
  heavily	
  on	
  OS	
  pagecache	
  for	
  
data	
  storage,	
  zerocopy	
  
•  No	
  GC,	
  No	
  Memory	
  overhead	
  
•  Ka:a	
  supports	
  end-­‐to-­‐end	
  batching	
  
and	
  compression	
  of	
  messages	
  
Tradi0onal	
  Messaging	
  
•  Tradi<onal	
  messaging	
  does	
  random	
  
file/memory	
  I/O	
  (BTree	
  structures)	
  
•  Typically	
  messaging	
  system	
  keep	
  
some	
  kind	
  of	
  per-­‐message	
  state	
  
about	
  what	
  has	
  been	
  consumed	
  and	
  
have	
  to	
  update	
  it	
  
•  Disk/Memory	
  is	
  used	
  for	
  storage	
  
•  JVM	
  ==	
  GC	
  and	
  memory	
  overhead	
  
•  Tradi<onal	
  messaging	
  is	
  typically	
  as	
  
non-­‐batch	
  and	
  un-­‐compressed	
  
Why	
  is	
  Ka:a	
  fast?	
  
23	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Canonical	
  Use	
  Cases	
  
•  Real-­‐Time	
  Stream	
  Processing	
  
	
  
•  General-­‐Purpose	
  Message	
  Bus	
  
	
  
•  User	
  Ac<vity	
  Data	
  Collec<on	
  
	
  
•  Opera<onal	
  Metrics	
  Collec<on	
  
(applica<ons,	
  servers,	
  or	
  devices)	
  
	
  
	
  
	
  
•  Log	
  Aggrega<on	
  
	
  
•  Change	
  Data	
  Capture	
  
	
  
•  Distributed	
  Systems	
  Commit	
  Log	
  
	
  
24	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ka:a	
  and	
  Cloudera	
  
25	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Simplified	
  Management	
  
•  Deploy	
  and	
  Configure	
  
Ka:a	
  clusters	
  
	
  
•  Unified	
  Management	
  
•  Mul<ple	
  Ka:a	
  
clusters	
  
•  En<re	
  plavorm	
  
	
  
•  Monitoring,	
  Alerts,	
  
and	
  Dashboards	
  
	
  
26	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Configure	
  Ka:a	
  using	
  CM	
  
27	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
CM	
  has	
  much	
  more!	
  
28	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
CM	
  has	
  much	
  more!	
  
29	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
CM	
  has	
  much	
  more!	
  
30	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ka:a	
  +	
  Apache	
  Flume	
  
•  Ka:a	
  can	
  be	
  configured	
  as	
  a	
  fast,	
  reliable	
  Flume	
  Channel	
  
•  Flume	
  Sources	
  and	
  Sinks	
  can	
  be	
  used	
  as	
  out-­‐of-­‐the-­‐box	
  Ka:a	
  Producers	
  and	
  Consumers	
  
Flume	
  Sinks	
  Consume	
  from	
  Ka4a:	
  
Write	
  data	
  to	
  HDFS,	
  HBase,	
  or	
  Search	
  
Flume	
  Sources	
  Write	
  to	
  Ka4a:	
  
Read	
  from	
  logs,	
  files,	
  jms,	
  hlp,	
  rpc,	
  thriu,	
  
etc	
  and	
  write	
  events	
  to	
  Ka:a	
  
31	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Cloudera	
  +	
  Ka:a	
  
Community	
  involvement	
  and	
  contribu0on:	
  
•  Spearheading	
  adding	
  security	
  features	
  to	
  Ka:a	
  
•  Iden<fied	
  and	
  fixed	
  core	
  architectural	
  issues	
  to	
  make	
  Ka:a	
  fully	
  reliable	
  
•  Strong	
  rela<onship	
  with	
  the	
  Confluent.io	
  and	
  other	
  Ka:a	
  Commilers	
  
	
  
Support	
  exper0se	
  and	
  experience:	
  
•  Mul<ple	
  produc<on	
  customers	
  
•  Support	
  team	
  trained	
  by	
  Ka:a	
  Commilers	
  
	
  
Integrated	
  with	
  Cloudera’s	
  produc0on-­‐ready	
  plaForm:	
  
•  Cloudera	
  Manager	
  CSD	
  makes	
  it	
  easy	
  to	
  deploy,	
  configure,	
  and	
  monitor	
  Ka:a	
  clusters	
  
•  End-­‐to-­‐end	
  workloads	
  with	
  other	
  components,	
  all	
  on	
  a	
  single	
  system	
  
•  Leading	
  security,	
  governance,	
  administra<on,	
  and	
  partner	
  network	
  
32	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Roadmap	
  
Security:	
  
• Authen<ca<on	
  with	
  Kerberos	
  
• Topic	
  level	
  Authoriza<on	
  
• SSL	
  encryp<on	
  of	
  data	
  over-­‐the-­‐wire	
  
	
  
• Improved	
  Cloudera	
  Manager	
  integra<on	
  	
  
• HUE	
  integra<on	
  
*Roadmap	
  is	
  subject	
  to	
  change	
  
33	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Demo	
  
34	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ka:a	
  Demo	
  
•  Install	
  and	
  configure	
  Ka:a	
  on	
  Cloudera	
  cluster	
  
•  Client	
  tools	
  -­‐	
  Add	
  and	
  consume	
  data	
  from	
  topics	
  
•  Replica<on	
  and	
  Failover	
  capabili<es	
  
•  Flume	
  Integra<on	
  and	
  demo	
  of	
  Ka:a	
  to	
  Flume	
  to	
  HDFS	
  
35	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Other	
  Topics	
  
36	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Clients/API’s	
  
•  Java,	
  Python,	
  Go,	
  C/C++,	
  .Net,	
  Clojure,	
  Ruby,	
  Erlang,	
  stdin/stdout	
  and	
  more	
  here,	
  
hlps://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-­‐
ProducerDaemon	
  
•  Producer	
  and	
  Consumer	
  API	
  
•  New	
  Java	
  Producer	
  API	
  was	
  in	
  0.8.2	
  
•  New	
  consumer	
  API	
  is	
  coming	
  in	
  next	
  release	
  
37	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Mirror	
  Maker	
  
•  Mul<	
  Ka:a	
  Cluster	
  replica<on,	
  HA	
  Across	
  datacenters	
  
38	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Camus/Samza/Ka:a	
  Manager	
  
•  Camus/Samza	
  are	
  tools	
  used	
  and	
  created	
  in	
  LinkedIn	
  
•  Camus	
  is	
  a	
  client	
  for	
  inges<ng	
  Ka:a	
  data	
  into	
  Hadoop	
  (MR	
  jobs	
  under	
  the	
  covers)	
  
•  Camus	
  being	
  phased	
  out	
  and	
  replaced	
  with	
  Gobblin	
  
•  Samza	
  is	
  stream	
  processing	
  framework	
  that	
  uses	
  Ka:a	
  for	
  messaging	
  and	
  YARN	
  
for	
  processing	
  (resource	
  management	
  etc)	
  
•  Management	
  tool	
  for	
  Ka:a	
  develop	
  @	
  Yahoo	
  
39	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Thank	
  You	
  

Mais conteúdo relacionado

Mais procurados

Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka confluent
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practicesconfluent
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...HostedbyConfluent
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpHostedbyConfluent
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Timothy Spann
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentHostedbyConfluent
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...confluent
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...confluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 

Mais procurados (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 

Destaque

Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexasArvind Prabhakar
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in RRevolution Analytics
 
I Should Have Used Social Selling | Gil Gunderson's Guide To Social Sales
I Should Have Used Social Selling | Gil Gunderson's Guide To Social SalesI Should Have Used Social Selling | Gil Gunderson's Guide To Social Sales
I Should Have Used Social Selling | Gil Gunderson's Guide To Social SalesGerry Moran
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupGwen (Chen) Shapira
 
Real-Time Fraud Detection with Storm and Kafka
Real-Time Fraud Detection with Storm and KafkaReal-Time Fraud Detection with Storm and Kafka
Real-Time Fraud Detection with Storm and KafkaAlexey Kharlamov
 
HBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpHBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpFwardNetwork
 
Introduction To Kibana
Introduction To KibanaIntroduction To Kibana
Introduction To KibanaJen Stirrup
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定Yoshiyasu SAEKI
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
 
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackKibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackSylvain Wallez
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 

Destaque (20)

Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in R
 
I Should Have Used Social Selling | Gil Gunderson's Guide To Social Sales
I Should Have Used Social Selling | Gil Gunderson's Guide To Social SalesI Should Have Used Social Selling | Gil Gunderson's Guide To Social Sales
I Should Have Used Social Selling | Gil Gunderson's Guide To Social Sales
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
'Flume' Case Study
'Flume' Case Study'Flume' Case Study
'Flume' Case Study
 
Apache flume
Apache flumeApache flume
Apache flume
 
Real-Time Fraud Detection with Storm and Kafka
Real-Time Fraud Detection with Storm and KafkaReal-Time Fraud Detection with Storm and Kafka
Real-Time Fraud Detection with Storm and Kafka
 
Kibana
KibanaKibana
Kibana
 
HBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpHBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejp
 
Apache Kafka Security
Apache Kafka Security Apache Kafka Security
Apache Kafka Security
 
Introduction To Kibana
Introduction To KibanaIntroduction To Kibana
Introduction To Kibana
 
Inside Flume
Inside FlumeInside Flume
Inside Flume
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackKibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stack
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 

Semelhante a Apache kafka

End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming ArchitecturesCloudera, Inc.
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaGrant Henke
 
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Data Con LA
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Pat Patterson
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Pat Patterson
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataMike Percy
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Wei-Chiu Chuang
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Kafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupKafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupJeff Holoman
 

Semelhante a Apache kafka (20)

End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
intro-kafka
intro-kafkaintro-kafka
intro-kafka
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataUsing Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop 3 (2017 hadoop taiwan workshop)
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Kafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupKafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User Group
 

Último

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 

Último (20)

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 

Apache kafka

  • 1. 1  ©  Cloudera,  Inc.  All  rights  reserved.   Apache  Ka:a  -­‐  Inges<on  and   Processing  Pipeline   NJ  Hadoop  Meetup  –  8/11/15   Shravan  Pabba  @skpabba  
  • 2. 2  ©  Cloudera,  Inc.  All  rights  reserved.   Agenda   •  Ka:a  Concepts  and  Architecture   •  Ka:a  vs  Tradi<onal  messaging  systems   •  Ka:a  with  Cloudera   •  Demo   § Install  and  configure  Ka:a  on  Cloudera  cluster   § Client  tools  -­‐  Add  and  consume  data  from  topics   § Replica<on  and  Failover  capabili<es   § Flume  Integra<on  and  demo  of  Ka:a  to  Flume  to  HDFS   •  Other  topics  
  • 3. 3  ©  Cloudera,  Inc.  All  rights  reserved.   About  Me   •  Systems  Engineer  @  Cloudera   •  Previously  Pre/Post  Sales  Architect  @  GigaSpaces,  IBM   •  Mainframes,  Client/Server,  Distributed  &  Cloud  
  • 4. 4  ©  Cloudera,  Inc.  All  rights  reserved.   Ka:a  Concepts  and  Architecture  
  • 5. 5  ©  Cloudera,  Inc.  All  rights  reserved.   Cloudera  Enterprise  Data  Hub   Inges<on   Typical  Data  Hub  Architecture   Cloudera  Manager   Ka:a   Flume   Spark  Streaming   DistCp   Sqoop   File  Dumping   Access  Layer   Interac<ve   JDBC   ODBC   ETL   Hive   Spark  DAG   MLlib   Girpah   Grid   Compute   Custom   Egress   DistCp   Producer   File   Dumping   Ka:a/ Custom   Custom   HBase  API   SolR   Engines  Storage  Layer   HDFS   HBase   SolR   Yarn   Spark   Map  Reduce  Impala   Sentry  (Security  Framework)   Encryp<on   Navigator   PIG  
  • 6. 6  ©  Cloudera,  Inc.  All  rights  reserved.   •  No  ability  to  replay  events   •  Mul<ple  sinks  requires  event  replica<on  (via  mul<ple  channels)   •  Sinks  that  share  a  source  (mostly)  process  events  in  sync   •  This  is  !ght  coupling   Why  Ka:a?  (Or  rather,  why  didn’t  LinkedIn  use  Flume?)   Spool Source Avro Sink Channel Spool Source Avro Sink Channel Avro Source HBase Sink Channel HDFS Sink HBase HDFS Logs More Logs Channel
  • 7. 7  ©  Cloudera,  Inc.  All  rights  reserved.   Why  Ka:a?   Web logs Hadoop Connections = O(1) 2009  
  • 8. 8  ©  Cloudera,  Inc.  All  rights  reserved.   Why  Ka:a?  Increasing  complexity   Web logs Hadoop Connections = O(1) Connections = O(Systems2) Transactions Metrics Web logs Hadoop Warehouse Alerting Audit Logs Security 2009   2014  
  • 9. 9  ©  Cloudera,  Inc.  All  rights  reserved.   Why  Ka:a?  Decoupling   Connections = O(Systems2) Transactions Metrics Web logs Hadoop Warehouse Alerting Audit Logs Security Transactions Metrics Web logs Hadoop Warehouse Alerting Audit Logs Security Connections = O(Systems) Kafka 2014   2015+?  
  • 10. 10  ©  Cloudera,  Inc.  All  rights  reserved.   • Distributed,  structured  logs  are  very  useful   • Resiliency  /  replica<on   •  Database  write-­‐ahead  logs  (HBase  WAL,  Oracle  Redo-­‐logs,  etc)   • System  decoupling   •  Enterprise  service  buses  (ESBs)   •  Data  integra<on  (change  data  capture)   • Stream  processing  (e.g.  real-­‐<me  alerts)   • Consensus  (using  logical  clocks)   Why  Ka:a?  Because  logs.  
  • 11. 11  ©  Cloudera,  Inc.  All  rights  reserved.   What  is  Ka:a?   •  Ka:a  is  …   Transactions Metrics Web logs Hadoop Warehouse Alerting Audit Logs Security Kafka
  • 12. 12  ©  Cloudera,  Inc.  All  rights  reserved.   What  is  Ka:a?   •  Ka:a  is  a  distributed,  …   Transactions Metrics Web logs Hadoop Warehouse Alerting Audit Logs Security Broker Broker Broker Kafka
  • 13. 13  ©  Cloudera,  Inc.  All  rights  reserved.   What  is  Ka:a?   •  Ka:a  is  a  distributed,  topic-­‐oriented,   …   Source 1 Topic 1 Sink 1 Source 2 Source 3 Topic 2 Sink 2 Broker
  • 14. 14  ©  Cloudera,  Inc.  All  rights  reserved.   What  is  Ka:a?   •  Ka:a  is  a  distributed,  topic-­‐oriented,   par00oned,  …   Source 1 Topic 1 Partition 1 Sink 1 Source 2 Source 3 Topic 2 Partition 1 Sink 2 Broker Topic 1 Partition 2 Topic 2 Partition 2 Broker
  • 15. 15  ©  Cloudera,  Inc.  All  rights  reserved.   What  is  Ka:a?   •  Ka:a  is  a  distributed,  topic-­‐oriented,   par<<oned,  replicated  commit  log.   Source 1 Topic 1 Partition 1 Sink 1 Source 2 Source 3 Topic 2 Partition 1 Sink 2 Broker Topic 1 Partition 2 Topic 2 Partition 2 Broker Topic 1 Partition 2 Topic 2 Partition 2 Topic 1 Partition 1 Topic 2 Partition 1
  • 16. 16  ©  Cloudera,  Inc.  All  rights  reserved.   What  is  Ka:a?   •  Ka:a  is  a  distributed,  topic-­‐oriented,   par<<oned,  replicated  commit  log.   •  Ka:a  is  also  pub-­‐sub  messaging   system.   •  Messages  can  be  text  (e.g.  syslog),  but   binary  is  best  (preferably  Avro!).   Source 1 Topic 1 Partition 1 Sink 1 Source 2 Source 3 Topic 2 Partition 1 Sink 2 Broker Topic 1 Partition 2 Topic 2 Partition 2 Broker Topic 1 Partition 2 Topic 2 Partition 2 Topic 1 Partition 1 Topic 2 Partition 1
  • 17. 17  ©  Cloudera,  Inc.  All  rights  reserved.   Architectural  Overview   •  Each  machine  is  called  a  Broker   •  Data  wrilen  belongs  to  Topics   (analogous  to  a  Table  in  a  database)   •  Each  Topic  is  par<<oned   •  Par<<ons  are  distributed  across  the   Brokers     •  Par<<ons  are  also  replicated  (one   replica  per  par<<on  is  Leader  Par<<on)     •  Producers  and  Consumers  talk  to  the   Leader  Par<<on   Broker  1   Broker  2   Broker  3   Par<<on  1   (Leader)   Par<<on  2   Par<<on  3   Par<<on  2   (Leader)   Par<<on  1   Par<<on  3   Par<<on  3   (Leader)   Par<<on  1   Par<<on  2   Producer   Producer   Consumer  Consumer   Ka:a  Cluster  
  • 18. 18  ©  Cloudera,  Inc.  All  rights  reserved.   The  Ka:a  Advantage     •  One  broker  can  handle  100MBs  of  reads/ writes  per  second,  from  1000s  clients     •  Messages  delivered  in  milliseconds   High-­‐Throughput  &  Low  Latency   •  Zero  data  loss  with  messages  persisted  on   disk  and  replicated  within  the  cluster   •  Highly-­‐available  with  fault-­‐tolerance  built   into  the  system.   Durability  &  Reliability   •  Elas<cally  and  transparently  add  more   machines  without  down<me  for  horizontal   scalability   •  Dynamically  add  Producers  &  Consumers   •  Enable  real-­‐<me  &  batch  consump<on   Scalability  &  Flexibility   •  Modest  cluster  op<mized  to  handle  millions   of  messages  per  second   •  Open  standard  for  long-­‐term  value   •  With  Cloudera,  a  single  system  for  mul<ple   workloads   Cost-­‐Efficient  
  • 19. 19  ©  Cloudera,  Inc.  All  rights  reserved.   How  does  it  compare  to  Flume  and  Tradi<onal   Messaging  
  • 20. 20  ©  Cloudera,  Inc.  All  rights  reserved.   Ka4a   •  Ka:a  is  very  much  a  general-­‐purpose   system.  Many  producers  and  many   consumers  sharing  mul<ple  topics   •  Ka:a,  has  a  significantly  smaller   producer  and  consumer  ecosystem   •  Ka:a  requires  an  external  stream   processing  system  for  that   •  Highly  Available  ingest  pipeline   Flume   •  Flume  is  a  special-­‐purpose  tool   designed  to  send  data  to  HDFS,  HBase   (and  Solr)   •  Flume  has  many  built-­‐in  sources  and   sinks   •  In-­‐flight  data  processing  using   interceptors.  Useful  for  data  masking   or  filtering   •  Flume  does  not  replicate  events   Ka:a  Vs  Flume  
  • 21. 21  ©  Cloudera,  Inc.  All  rights  reserved.   Random  and  Sequen<al  Access  in  Disk  and  Memory   Source:  hlp://queue.acm.org/detail.cfm?id=1563874  
  • 22. 22  ©  Cloudera,  Inc.  All  rights  reserved.   Ka4a   •  Ka:a  does  only  sequen<al  file  I/O   •  Ka:a  keeps  a  single  pointer  into  each   par<<on  of  a  topic.  All  messages  prior   to  the  pointer  are  considered   consumed,  and  all  messages  auer  it   are  consider  unconsumed   •  Relies  heavily  on  OS  pagecache  for   data  storage,  zerocopy   •  No  GC,  No  Memory  overhead   •  Ka:a  supports  end-­‐to-­‐end  batching   and  compression  of  messages   Tradi0onal  Messaging   •  Tradi<onal  messaging  does  random   file/memory  I/O  (BTree  structures)   •  Typically  messaging  system  keep   some  kind  of  per-­‐message  state   about  what  has  been  consumed  and   have  to  update  it   •  Disk/Memory  is  used  for  storage   •  JVM  ==  GC  and  memory  overhead   •  Tradi<onal  messaging  is  typically  as   non-­‐batch  and  un-­‐compressed   Why  is  Ka:a  fast?  
  • 23. 23  ©  Cloudera,  Inc.  All  rights  reserved.   Canonical  Use  Cases   •  Real-­‐Time  Stream  Processing     •  General-­‐Purpose  Message  Bus     •  User  Ac<vity  Data  Collec<on     •  Opera<onal  Metrics  Collec<on   (applica<ons,  servers,  or  devices)         •  Log  Aggrega<on     •  Change  Data  Capture     •  Distributed  Systems  Commit  Log    
  • 24. 24  ©  Cloudera,  Inc.  All  rights  reserved.   Ka:a  and  Cloudera  
  • 25. 25  ©  Cloudera,  Inc.  All  rights  reserved.   Simplified  Management   •  Deploy  and  Configure   Ka:a  clusters     •  Unified  Management   •  Mul<ple  Ka:a   clusters   •  En<re  plavorm     •  Monitoring,  Alerts,   and  Dashboards    
  • 26. 26  ©  Cloudera,  Inc.  All  rights  reserved.   Configure  Ka:a  using  CM  
  • 27. 27  ©  Cloudera,  Inc.  All  rights  reserved.   CM  has  much  more!  
  • 28. 28  ©  Cloudera,  Inc.  All  rights  reserved.   CM  has  much  more!  
  • 29. 29  ©  Cloudera,  Inc.  All  rights  reserved.   CM  has  much  more!  
  • 30. 30  ©  Cloudera,  Inc.  All  rights  reserved.   Ka:a  +  Apache  Flume   •  Ka:a  can  be  configured  as  a  fast,  reliable  Flume  Channel   •  Flume  Sources  and  Sinks  can  be  used  as  out-­‐of-­‐the-­‐box  Ka:a  Producers  and  Consumers   Flume  Sinks  Consume  from  Ka4a:   Write  data  to  HDFS,  HBase,  or  Search   Flume  Sources  Write  to  Ka4a:   Read  from  logs,  files,  jms,  hlp,  rpc,  thriu,   etc  and  write  events  to  Ka:a  
  • 31. 31  ©  Cloudera,  Inc.  All  rights  reserved.   Cloudera  +  Ka:a   Community  involvement  and  contribu0on:   •  Spearheading  adding  security  features  to  Ka:a   •  Iden<fied  and  fixed  core  architectural  issues  to  make  Ka:a  fully  reliable   •  Strong  rela<onship  with  the  Confluent.io  and  other  Ka:a  Commilers     Support  exper0se  and  experience:   •  Mul<ple  produc<on  customers   •  Support  team  trained  by  Ka:a  Commilers     Integrated  with  Cloudera’s  produc0on-­‐ready  plaForm:   •  Cloudera  Manager  CSD  makes  it  easy  to  deploy,  configure,  and  monitor  Ka:a  clusters   •  End-­‐to-­‐end  workloads  with  other  components,  all  on  a  single  system   •  Leading  security,  governance,  administra<on,  and  partner  network  
  • 32. 32  ©  Cloudera,  Inc.  All  rights  reserved.   Roadmap   Security:   • Authen<ca<on  with  Kerberos   • Topic  level  Authoriza<on   • SSL  encryp<on  of  data  over-­‐the-­‐wire     • Improved  Cloudera  Manager  integra<on     • HUE  integra<on   *Roadmap  is  subject  to  change  
  • 33. 33  ©  Cloudera,  Inc.  All  rights  reserved.   Demo  
  • 34. 34  ©  Cloudera,  Inc.  All  rights  reserved.   Ka:a  Demo   •  Install  and  configure  Ka:a  on  Cloudera  cluster   •  Client  tools  -­‐  Add  and  consume  data  from  topics   •  Replica<on  and  Failover  capabili<es   •  Flume  Integra<on  and  demo  of  Ka:a  to  Flume  to  HDFS  
  • 35. 35  ©  Cloudera,  Inc.  All  rights  reserved.   Other  Topics  
  • 36. 36  ©  Cloudera,  Inc.  All  rights  reserved.   Clients/API’s   •  Java,  Python,  Go,  C/C++,  .Net,  Clojure,  Ruby,  Erlang,  stdin/stdout  and  more  here,   hlps://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-­‐ ProducerDaemon   •  Producer  and  Consumer  API   •  New  Java  Producer  API  was  in  0.8.2   •  New  consumer  API  is  coming  in  next  release  
  • 37. 37  ©  Cloudera,  Inc.  All  rights  reserved.   Mirror  Maker   •  Mul<  Ka:a  Cluster  replica<on,  HA  Across  datacenters  
  • 38. 38  ©  Cloudera,  Inc.  All  rights  reserved.   Camus/Samza/Ka:a  Manager   •  Camus/Samza  are  tools  used  and  created  in  LinkedIn   •  Camus  is  a  client  for  inges<ng  Ka:a  data  into  Hadoop  (MR  jobs  under  the  covers)   •  Camus  being  phased  out  and  replaced  with  Gobblin   •  Samza  is  stream  processing  framework  that  uses  Ka:a  for  messaging  and  YARN   for  processing  (resource  management  etc)   •  Management  tool  for  Ka:a  develop  @  Yahoo  
  • 39. 39  ©  Cloudera,  Inc.  All  rights  reserved.   Thank  You