SlideShare a Scribd company logo
1 of 20
Download to read offline


Ingestion	
  Comparison















Thomas	
  Schreiter

Insight	
  Data	
  Engineering	
  Fellow
Ingestion	
  =	
  Message	
  Queuing	
  System
ProducerProducerProducers
ProducerProducerConsumers
Research	
  Question:	
  
How	
  fast	
  can	
  data	
  be	
  produced	
  into	
  Kinesis/Kafka

if	
  all	
  producers	
  run	
  on	
  only	
  one	
  node?
ProducerProducerProducers
1x	
  m3.medium
DEMO	
  ….
Throughput	
  over	
  #producers
5
6
0"
5000"
10000"
15000"
20000"
25000"
30000"
35000"
1" 2" 5" 10" 20" 50" 100" 200" 500"
Throughput)[msg/sec])
Bulk)Size)[msg])
Throughput)over)Bulk)Size)
Ka)a"
Kinesis"
ProducerProducerProducer.py
ProducerProducerProducer.py
4x	
  m3.large
1x	
  m3.medium
1x	
  m3.medium
1	
  stream
“Message #0 to Kafka @ 12:39:04.300”
“Message #1 to Kafka @ 12:39:04.310”
…
“Message #0 to Kinesis @ 13:00:05.700”
“Message #1 to Kinesis @ 13:00:05.702”
…
logger
metrics
ProducerProducerProducer.py
ProducerProducerProducer.py
4x	
  m3.large
1x	
  m3.medium
1x	
  m3.medium
1	
  stream
“Message #0 to Kafka @ 12:39:04.300”
“Message #1 to Kafka @ 12:39:04.310”
…
“Message #0 to Kinesis @ 13:00:05.700”
“Message #1 to Kinesis @ 13:00:05.702”
…
logger
metrics
“Message #0 to Kafka @ 12:39:04.300”
“Message #1 to Kafka @ 12:39:04.310”
…
“Message #0 to Kinesis @ 13:00:05.700”
“Message #1 to Kinesis @ 13:00:05.702”
…
ProducerProducerProducer.py
ProducerProducerProducer.py
4x	
  m3.large
1x	
  m3.medium
1x	
  m3.medium
1x	
  m3.medium
1x	
  t2.micro
1	
  stream
Engineering	
  Challenges
Install	
  scripts:	
  tried	
  to	
  automate	
  everything	
  ☺

Engineering	
  Challenges
Install	
  scripts:	
  tried	
  to	
  automate	
  everything	
  ☺

broke	
  Kafka	
  installation	
  in	
  Week	
  2	
  	
  ☹ 

Engineering	
  Challenges
Install	
  scripts:	
  tried	
  to	
  automate	
  everything	
  ☺

broke	
  Kafka	
  installation	
  in	
  Week	
  2	
  	
  ☹ 

and	
  again	
  in	
  Week	
  4	
  	
  	
  	
  	
  ☹ ☹ ☹
but	
  Engineering	
  puzzles	
  are	
  really	
  fun	
  	
  
☺☺☺
And	
  I	
  read	
  Kafka	
  

for	
  the	
  first	
  time
Thomas	
  Schreiter

[thomas.dataengineer@gmail.com]	
  
M.Sc.	
  +	
  B.Sc.	
  in	
  Computer	
  Science	
  @Karlsruhe	
  Institute	
  of	
  Technology,	
  Germany	
  
Ph.D.	
  in	
  Transportation	
  @Delft	
  University	
  of	
  Technology,	
  The	
  Netherlands	
  
Before	
  Insight:	
  Research	
  Engineer	
  in	
  Transportation	
  @UC	
  Berkeley
AWS	
  Costs
17
Throughput	
  over	
  #partitions
Throughput	
  [#msg/sec]
0
300
600
900
1200
1	
  par__on 2	
  par__ons 3	
  par__ons 4	
  par__ons
Ka`a
Kinesis
Older	
  results
Throughput	
  [#msg/sec]
0
500
1000
1500
2000
1	
  par__on 2	
  par__ons 3	
  par__ons 4	
  par__ons
Ka`a
Kinesis

More Related Content

Similar to Thomas schreiter Insight

Apache kafka
Apache kafkaApache kafka
Apache kafka
MvkZ
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
MvkZ
 

Similar to Thomas schreiter Insight (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
TerraEchos Kairos on IBM PowerLinux servers
TerraEchos Kairos on IBM PowerLinux serversTerraEchos Kairos on IBM PowerLinux servers
TerraEchos Kairos on IBM PowerLinux servers
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
Kickstaring the transition to parallel computing with open hardware
Kickstaring the transition to parallel computing with open hardwareKickstaring the transition to parallel computing with open hardware
Kickstaring the transition to parallel computing with open hardware
 
F_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxF_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptx
 
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTTIn search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
The Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half OverThe Parallel Computing Revolution Is Only Half Over
The Parallel Computing Revolution Is Only Half Over
 
Fan-in Flames: Scaling Kafka to Millions of Producers With Ryanne Dolan | Cur...
Fan-in Flames: Scaling Kafka to Millions of Producers With Ryanne Dolan | Cur...Fan-in Flames: Scaling Kafka to Millions of Producers With Ryanne Dolan | Cur...
Fan-in Flames: Scaling Kafka to Millions of Producers With Ryanne Dolan | Cur...
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
IoT meets Big Data
IoT meets Big DataIoT meets Big Data
IoT meets Big Data
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
 
Lightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTTLightweight and scalable IoT Architectures with MQTT
Lightweight and scalable IoT Architectures with MQTT
 

Thomas schreiter Insight