SlideShare uma empresa Scribd logo
1 de 34
1© Copyright 2016 EMC Corporation. All rights reserved.
Improved Reliable Streaming Processing:
Apache Storm as example
Frank Zhao, EMC CTO Office,
Fenghao Zhang*, Microsoft Bing,
Yusong Lv*, Peking University
Special thanks to EMC Ken Taylor, John Cardente and Lincourt Robert
*Zhang and Lv contributed to the research when they worked at EMC China COE
2© Copyright 2016 EMC Corporation. All rights reserved.
The technology concepts being discussed and demonstrated are
the result of research conducted by the Advanced Research &
Development (ARD) team from the EMC Office of the CTO. Any
demonstrated capability is only for research purpose and at a
prototype phase, therefore : THERE ARE NO IMMEDIATE PLANS
3© Copyright 2016 EMC Corporation. All rights reserved.
• Distributed Streaming System
• Reliable Processing
• Apache Storm’s Solution, the Challenge
• New Proposed Approaches
– Fingerprint, and share-split
• Prototyping with Apache Storm and Benchmark
• Summary and Outlook
4© Copyright 2016 EMC Corporation. All rights reserved.
• As service, continuously process data (a.k.a message or tuple)
in scalable, reliable and high-performance way (msec)
– Open-source: Storm, Flink, Spark-Streaming, Samza
Streaming processing
5© Copyright 2016 EMC Corporation. All rights reserved.
Streaming Processing
(Storm, Spark Streaming)
Batch processing
(Hadoop MR)
Type Continuous(never-stop),
real-time (ms level)
Model DAG/graph MapReduce like Jobs
Workload CPU/Memory intensive CPU/mem and IO internsive
State Stateless, may period ckpt Stateful
Cluster Master-Slave w/ Zookeeper (Storm) Master-Slave or Job-task
Fault-tolerance/HA Fault-tolerance/HA
Streaming vs. batch processing
6© Copyright 2016 EMC Corporation. All rights reserved.
Storm Flink Spark
Built since 2011 (Apache, Trident)
2016 (Twitter Heron)
2014 (Apache) ~2013
Streaming Native
(micro-batch, Trident)
Native Micro-batch
Guarantee At least once
(exactly-once w/ Trident)
Exactly-once Exactly-once
Fault-Tolerance Ack per message Checkpoint Checkpoint
Latency 5 4 3
Throughput 4 5 5
Ecosystem 5 3 3
Storm, Flink, Spark streaming*
*Personal observations for reference only
7© Copyright 2016 EMC Corporation. All rights reserved.
• Every message shall be guaranteed processed
– At-most once
– At-least once
– Exactly once
Reliable processing
May save
Topology (DAG)
9Data source
Bolt (worker, task, op)
8© Copyright 2016 EMC Corporation. All rights reserved.
• Scalable
• Fault-tolerant
• Guaranteed message processing
– At least once (default)
• Fast: ms level
– Pure memory computing, no checkkpoint
• Simple programming model
– Topology - Spouts – Bolts
– Clojure, Java, Ruby, Python …
Apache Storm
9© Copyright 2016 EMC Corporation. All rights reserved.
Storm: designs for fault-tolerance
 Deploy topology
 Dispatch tasks
 Monitor cluster
 Coordination
 States of Nimbus
 State of supervisor
 …
Task Task
Those FT are about thread/task/
job or node, NOT message
10© Copyright 2016 EMC Corporation. All rights reserved.
• Critical message granularity (NOT thread/task/job/node)
• Need an efficient method, considering
– Every component may fault
– Large topology, continuous flooding messages
– Network temp unavailable, traffic out-of-order, …
– Minimized resource usage (network, cpu, mem)
Track processing status in DAG
9Data source
11© Copyright 2016 EMC Corporation. All rights reserved.
History of Apache Storm and lessons learned
– Nathan Marz, creator of Storm
Tough problem and Storm’s answer!
12© Copyright 2016 EMC Corporation. All rights reserved.
Storm reliability track algorithm
Status Acker
srcNodeID: R, R
R ⊕ A ⊕ B ⊕ C
A ⊕ D
B ⊕ E
C ⊕ F
D⊕ E ⊕ F
Status = R ⊕
R ⊕ A ⊕ B ⊕ C ⊕
A ⊕ D ⊕
B ⊕ E ⊕
C ⊕ F ⊕
D ⊕ E ⊕ F =
1. Each msg has ID (8B random number)
2. Each bolt runs XOR (inMsgID, outMsgID[]) per inMsg
3. Each bolt sends XOR (per inMsg) result to Acker
4. Acker runs XOR: always 8B (regardless topology size)
5. Finally, given timeout, Acker.status shall be 0 means OK
otherwise something failed (may false-alarm, but never miss) 0
13© Copyright 2016 EMC Corporation. All rights reserved.
• RandomNum + XOR based, the key foundation of Storm that
runs for 5+Y
– Smart, simple and pretty good!
– Least memory footprint at Acker, regardless of topology
– Reliable*, regardless of Ack traffic order
– XOR op: commutative law, associative law
• Easy to handle any Out-of-order
*: in theory, random ID may collision
14© Copyright 2016 EMC Corporation. All rights reserved.
• Network traffic, CPU overhead  latency & throughput impact
– Possibility of random number collision
25000 msg/sec
9300 msg/sec
Non-reliable processing
reliable processing
*3rd party benchmark in 2012, things may change now
15© Copyright 2016 EMC Corporation. All rights reserved.
Ack only at leaf?
Data source
Current algorithm is fantastic, however
16© Copyright 2016 EMC Corporation. All rights reserved.
• Same-level guaranteed reliable processing
• More scalable, efficient and fast
– Much less Ack traffic; usually only at leaf nodes
– Same memory footprint, less CPU usage
– Eventually better latency/throughput
2 new proposed approaches
Currently in research & quick validation phase
17© Copyright 2016 EMC Corporation. All rights reserved.
• An evolution based on Random Num + XOR
Approach-1: fingerprint based
Currently, XOR in-pair (send, recv), then it’s 0
Further, XOR in multiple pairs (2, 4, 6, …), still 0
18© Copyright 2016 EMC Corporation. All rights reserved.
• Fingerprint(FP): A digest (i.e., 8B) of {in msgs, out msgs and
parent.fp}, to encode & represent the context then recursively pass-
down. That each downstream inherits genes from all ancestors
– Still use XOR of IDs, redundant in scalable way
– 3-rule: Embedded, Recursively inherited and Append-only update
Approach-1: fingerprint idea
iMsg <Mj, FPj >
Msg < Mj+1, FPj:i >
Msg < Mj+2, FPj:i >
Msg < Mj+2, FPj:i >
Msg <…>
Pass-down FP
InMsgID XOR [outMsgIDs]
• Embedded: as part of metadata
• Recursive-inherit: pass-down
• Append-update: via XOR
Append update
19© Copyright 2016 EMC Corporation. All rights reserved.
Fingerprint example
FP0= R ⊕ A ⊕ B ⊕ C
FP1= FP0 ⊕ A ⊕ D
FP2= FP0 ⊕ B ⊕ E
FP3= FP0 ⊕ C ⊕ F
Leaf has 3 Ack traffic:
FP4-D= FP1 ⊕ D
FP4-E= FP2 ⊕ E
FP4-F = FP3 ⊕ F
 Acker.status = R ⊕
(FP0 ⊕ A ⊕ D) ⊕ D ⊕
(FP0 ⊕ B ⊕ E) ⊕ E ⊕
(FP0 ⊕ C ⊕ F) ⊕ F =
srcNodeID: RootMsgID, R
A, FP0
C, FP0
B, FP0
D, FP1
E, FP2
F, FP3
Init: R
Calculate FP
May batch
20© Copyright 2016 EMC Corporation. All rights reserved.
Approach-1: failure example
srcNodeID : RootMsgID, R
A, FP0
C, FP0
B, FP0
D, FP1
E, FP2
F, FP3
Init = R
if msg D failed, then node4 only Ack FP4-E and FP4-F, finally Acker.status =
= R ⊕ FP4-E ⊕ FP4-F
= R ⊕ FP2 ⊕ E ⊕ FP3 ⊕ F
= R ⊕ (FP ⊕ B ⊕ E ⊕ E) ⊕ (FP ⊕ C ⊕ F ⊕ F)
= R ⊕ B ⊕ C != 0
Another example, if all message failed, Ack is R !=0
 Missing info about A/D path, due to failure!!
21© Copyright 2016 EMC Corporation. All rights reserved.
Approach-1: a complex example
Initial : R
FP1= R ⊕ A ⊕ B ⊕ C
FP2= FP1 ⊕ A ⊕ D
FP3= FP1 ⊕ B ⊕ X
FP4= FP1 ⊕ C ⊕ E
//update FP5 to Acker since even
number of downstreams (2)
FP5= FP2 ⊕ D ⊕
FP3 ⊕ X ⊕
FP4 ⊕ E ⊕ (F ⊕ G)
FP6= FP5 ⊕ F ⊕ H
FP7= FP5 ⊕ G ⊕ I
// blot8 sends FP8 to Acker
FP8= FP6 ⊕ H ⊕ FP7 ⊕ I
Final Status = R ⊕ FP5 ⊕ FP8
= R ⊕ FP5 ⊕ (FP5 ⊕ F) ⊕ (FP5 ⊕ G)
= R ⊕ FP5 ⊕ (F ⊕ G)
= R ⊕ FP2 ⊕ D ⊕ FP3 ⊕ X ⊕ FP4 ⊕ E
= R ⊕ (FP1 ⊕ A ⊕ B ⊕ C )
= 0
Limit and note: 1) downstream msg shall be odd number (1,3, 5, …); otherwise, bolt must send the new FP
to Acker, where Acker would run XOR with the new FP; 2) To implement such approach, ideally bolt needs
to know the total downstream number to generate FP before emit.
22© Copyright 2016 EMC Corporation. All rights reserved.
• For input rootMsg, INIT a BIG SHARE (8B), EMBED as metadata, pass-down
• SPLIT attached share by Storm at each bolt, EMBED, repeat this until leaf ...
• Only leaf ACK to Acker about received share at hand
• Acker REDO: decrease the reported share, finally 0 means ok; or-else failure
– No random(no collision), no XOR; inline embedded; split is transparent to App
– +/- (mod): follow commutative & associative law, resolve out-of-order issue
Approach-2: share split
srcNodeID: rootMsgID,BIG-Share
B, 50
C, 50
D, 25
E, 25
F, 17
G, 17
H, 16
I, 25
J, 25
K, 17
L, 17
A,1, 100
A, 0, 16
A, 0, 84M, 16
Like: IPO/stock share, split, increase share
23© Copyright 2016 EMC Corporation. All rights reserved.
• Rare case: INCREASE share if insufficient to split (also syncup the Acker)
• Acker then ADD the newly increased share (NOT decrease)
Approach-2: share split (con’t)
srcNodeID, RootMsgID,Share
B, 99
C, 1
F, 33
G, 33
H, 34
A, 100
A, +99
increase share;
Sync-up Acker
If S - S1 - S2 - … = Sn, then S - S1 - S2 - … - Sn =
(Ack may batch)
24© Copyright 2016 EMC Corporation. All rights reserved.
• Implemented Approach-2 (share-split)
• Integrate with Storm 1.0.1 (Released in May 2016)
– Storm core (~200 LOC in Clojure: LISP-like) and Java APIs (~200 LOC
including some traces/tests)
• Implementation notes:
– Support BasicBolt, remove randomNum, re-use some existing
structures/APIs i.e., Anchors-to-ids (RootID:shareAttached), Ack sending
– Global pre-defined split share at all bolts (equally split)
• Next, configurable split approach per bolt
– To exactly split share, build 1-step delay emit
• Pre-split the input share
• Once new tuple generated, emit internally queue it until next tuple come out
• Finally explicitly call emitDone(), thus last tuple takes over all left share and emit
25© Copyright 2016 EMC Corporation. All rights reserved.
• Function & performance
– network traffic, CPU, latency/throughput
• Reference IBM whitepaper (Storm vs. IBM InfoSphere): 7 layers
– We use Wikipedia as data source; words processing
1000 Mbps
Ubuntu 15.10 (4.2.0)
Storm 1.0.1
Ubuntu 15.10 (4.2.0)
Storm 1.0.1
E5-2643 @ 3.40GHz,
24 cores;
E5-2643 @ 3.40GHz,
24 cores;
Ubuntu 15.10 (4.2.0)
Storm 1.0.1
E5-2643 @ 3.40GHz,
24 cores;
26© Copyright 2016 EMC Corporation. All rights reserved.
• Function: Inject error and validate reliability detection: Pass
– Same-level reliability as existing approach
• Performance: same HW/SW config and processing logic
– 16KB tuple, 100 pending, 48 parallelism per bolt
– 4 workers & 12 Ackers per host
Result: function & performance
27© Copyright 2016 EMC Corporation. All rights reserved.
• 1/3 Ack traffic, 18% faster, 9% less CPU
Test1: 3 layers
Current New
Ack traffic(Mil)
Current New
End-end Latency(ms)
Current New
CPU (per Java worker)
28© Copyright 2016 EMC Corporation. All rights reserved.
• 1/5 Ack traffic, 23% faster, 14% less CPU
Test2: 7 layers
Current New
Ack traffic(Mil)
Current New
End-end latency(ms)
Current New
CPU (per Java worker)
29© Copyright 2016 EMC Corporation. All rights reserved.
• Larger topology? Quick test of 11 layers:
– 1/9 traffic
• Suppose the larger of topology, the more gains to achieve
• Next
– Refine multi-Acker
– Implement “Increase Share” operation
– Configurable split method per bolt
• So Dev can specify desired split way rather than fixed/global
• May integrate with Twitter Heron? Or apply to other areas?
– i.e., function call graph? performance trace? (more…)
30© Copyright 2016 EMC Corporation. All rights reserved.
End-end IoT landscape
Continuous, scalable,
Real-time processing
31© Copyright 2016 EMC Corporation. All rights reserved.
• Lambda architecture, fusion “historical ”+“new” data
– Proposed by Nathan Marz (5y ago), batch + streaming
– widely adopted in many Internet company
Unified data processing
32© Copyright 2016 EMC Corporation. All rights reserved.
• 2 innovative & inspiring streaming reliability algorithms
– Guaranteed with minimized mem footprint
– More scalable, efficient & fast, and even beautiful
• Demonstrate in Storm
– 1/N Ack traffic, only needed at leaf nodes
• N is topology depth. Usually a few leaf for aggregation, DB saving etc
• meanwhile, 23% faster, 14% less CPU
– Transparent to App except the last explicit emitDone() call
• Applying to other interesting areas...
– Distributed replication, tx, exact-state tracking, …
33© Copyright 2016 EMC Corporation. All rights reserved.
• Feedback or comments? talk with us!
– Any flaw, constraints, or room to improve?
– then discuss with Storm community; Codes can be shared if needed
Improved Reliable Streaming Processing: Apache Storm as example

Mais conteúdo relacionado

Mais procurados

Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormLester Martin
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter StormUwe Printz
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with StormMariusz Gil
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Dan Lynn
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormAndrea Iacono
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationUday Vakalapudi

Mais procurados (20)

Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
Apache Storm
Apache StormApache Storm
Apache Storm

Semelhante a Improved Reliable Streaming Processing: Apache Storm as example

BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...Alexandre Moneger
FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...
FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...
FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...PROFIBUS and PROFINET InternationaI - PI UK
Hacker's and painters Hardware Hacking 101 - 10th Oct 2014
Hacker's and painters Hardware Hacking 101 - 10th Oct 2014Hacker's and painters Hardware Hacking 101 - 10th Oct 2014
Hacker's and painters Hardware Hacking 101 - 10th Oct 2014Takeda Pharmaceuticals
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Hsien-Hsin Sean Lee, Ph.D.
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackTomer Zait
Embedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaEmbedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaJohanAspro
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Anil Madhavapeddy
Buffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the StackBuffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the StackironSource
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...Positive Hack Days
Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...
Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...
Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...siouxhotornot
0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with ErlangMaxim Kharchenko
Erlang Developments: The Good, The Bad and The Ugly
Erlang Developments: The Good, The Bad and The UglyErlang Developments: The Good, The Bad and The Ugly
Erlang Developments: The Good, The Bad and The Uglyenriquepazperez
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
Java on arm theory, applications, and workloads [dev5048]
Java on arm  theory, applications, and workloads [dev5048]Java on arm  theory, applications, and workloads [dev5048]
Java on arm theory, applications, and workloads [dev5048]Aleksei Voitylov
LECTURE2 td 2 sue les theories de graphes
LECTURE2 td 2 sue les theories de graphesLECTURE2 td 2 sue les theories de graphes
LECTURE2 td 2 sue les theories de graphesAhmedMahjoub15
How does ping_work_style_1_gv
How does ping_work_style_1_gvHow does ping_work_style_1_gv
How does ping_work_style_1_gvvgy_a

Semelhante a Improved Reliable Streaming Processing: Apache Storm as example (20)

BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...
FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...
FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...
Hacker's and painters Hardware Hacking 101 - 10th Oct 2014
Hacker's and painters Hardware Hacking 101 - 10th Oct 2014Hacker's and painters Hardware Hacking 101 - 10th Oct 2014
Hacker's and painters Hardware Hacking 101 - 10th Oct 2014
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The Stack
Embedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaEmbedded system Design introduction _ Karakola
Embedded system Design introduction _ Karakola
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)
Buffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the StackBuffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the Stack
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...
Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...
Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...
0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with Erlang
Erlang OTP
Erlang OTPErlang OTP
Erlang OTP
Erlang Developments: The Good, The Bad and The Ugly
Erlang Developments: The Good, The Bad and The UglyErlang Developments: The Good, The Bad and The Ugly
Erlang Developments: The Good, The Bad and The Ugly
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
Java on arm theory, applications, and workloads [dev5048]
Java on arm  theory, applications, and workloads [dev5048]Java on arm  theory, applications, and workloads [dev5048]
Java on arm theory, applications, and workloads [dev5048]
LECTURE2 td 2 sue les theories de graphes
LECTURE2 td 2 sue les theories de graphesLECTURE2 td 2 sue les theories de graphes
LECTURE2 td 2 sue les theories de graphes
How does ping_work_style_1_gv
How does ping_work_style_1_gvHow does ping_work_style_1_gv
How does ping_work_style_1_gv

Mais de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit

Mais de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop


(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

Último (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

Improved Reliable Streaming Processing: Apache Storm as example

  • 1. 1© Copyright 2016 EMC Corporation. All rights reserved. Improved Reliable Streaming Processing: Apache Storm as example Frank Zhao, EMC CTO Office, Fenghao Zhang*, Microsoft Bing, Yusong Lv*, Peking University Special thanks to EMC Ken Taylor, John Cardente and Lincourt Robert *Zhang and Lv contributed to the research when they worked at EMC China COE
  • 2. 2© Copyright 2016 EMC Corporation. All rights reserved. The technology concepts being discussed and demonstrated are the result of research conducted by the Advanced Research & Development (ARD) team from the EMC Office of the CTO. Any demonstrated capability is only for research purpose and at a prototype phase, therefore : THERE ARE NO IMMEDIATE PLANS NOR INDICATION OF SUCH PLANS FOR PRODUCTIZATION OF THESE CAPABILITIES AT THE TIME OF PRESENTATION. THINGS MAY OR MAY NOT CHANGE IN THE FUTURE. DISCLAIMER
  • 3. 3© Copyright 2016 EMC Corporation. All rights reserved. • Distributed Streaming System • Reliable Processing • Apache Storm’s Solution, the Challenge • New Proposed Approaches – Fingerprint, and share-split • Prototyping with Apache Storm and Benchmark • Summary and Outlook Agenda
  • 4. 4© Copyright 2016 EMC Corporation. All rights reserved. • As service, continuously process data (a.k.a message or tuple) in scalable, reliable and high-performance way (msec) – Open-source: Storm, Flink, Spark-Streaming, Samza Streaming processing
  • 5. 5© Copyright 2016 EMC Corporation. All rights reserved. Streaming Processing (Storm, Spark Streaming) Batch processing (Hadoop MR) Type Continuous(never-stop), real-time (ms level) Batch/Period Model DAG/graph MapReduce like Jobs Workload CPU/Memory intensive CPU/mem and IO internsive State Stateless, may period ckpt Stateful Cluster Master-Slave w/ Zookeeper (Storm) Master-Slave or Job-task Fault- tolerance Fault-tolerance/HA Fault-tolerance/HA Streaming vs. batch processing
  • 6. 6© Copyright 2016 EMC Corporation. All rights reserved. Storm Flink Spark Streaming Built since 2011 (Apache, Trident) 2016 (Twitter Heron) 2014 (Apache) ~2013 Streaming Native (micro-batch, Trident) Native Micro-batch Guarantee At least once (exactly-once w/ Trident) Exactly-once Exactly-once Fault-Tolerance Ack per message Checkpoint Checkpoint Latency 5 4 3 Throughput 4 5 5 Ecosystem 5 3 3 Storm, Flink, Spark streaming* *Personal observations for reference only
  • 7. 7© Copyright 2016 EMC Corporation. All rights reserved. • Every message shall be guaranteed processed – At-most once – At-least once – Exactly once Reliable processing May save result Topology (DAG) 0 1 2 3 4 5 6 7 8 9Data source B C D E F G H I J K L M Spout R Bolt (worker, task, op)
  • 8. 8© Copyright 2016 EMC Corporation. All rights reserved. • Scalable • Fault-tolerant • Guaranteed message processing – At least once (default) • Fast: ms level – Pure memory computing, no checkkpoint • Simple programming model – Topology - Spouts – Bolts – Clojure, Java, Ruby, Python … Apache Storm
  • 9. 9© Copyright 2016 EMC Corporation. All rights reserved. Storm: designs for fault-tolerance Nimbus  Deploy topology  Dispatch tasks  Monitor cluster Zookeeper cluster  Coordination  States of Nimbus  State of supervisor  … Supervisor Executor Task Task WorkersMaster Those FT are about thread/task/ job or node, NOT message
  • 10. 10© Copyright 2016 EMC Corporation. All rights reserved. • Critical message granularity (NOT thread/task/job/node) • Need an efficient method, considering – Every component may fault – Large topology, continuous flooding messages – Network temp unavailable, traffic out-of-order, … – Minimized resource usage (network, cpu, mem) Track processing status in DAG 0 1 2 3 4 5 6 7 8 9Data source B C D E F G H I J K L M Spout R Bolt
  • 11. 11© Copyright 2016 EMC Corporation. All rights reserved. History of Apache Storm and lessons learned – Nathan Marz, creator of Storm Tough problem and Storm’s answer!
  • 12. 12© Copyright 2016 EMC Corporation. All rights reserved. Storm reliability track algorithm 0 1 2 3 4 Status Acker srcNodeID: R, R A B C D E F R ⊕ A ⊕ B ⊕ C A ⊕ D B ⊕ E C ⊕ F D⊕ E ⊕ F R Status = R ⊕ R ⊕ A ⊕ B ⊕ C ⊕ A ⊕ D ⊕ B ⊕ E ⊕ C ⊕ F ⊕ D ⊕ E ⊕ F = 1. Each msg has ID (8B random number) 2. Each bolt runs XOR (inMsgID, outMsgID[]) per inMsg 3. Each bolt sends XOR (per inMsg) result to Acker 4. Acker runs XOR: always 8B (regardless topology size) 5. Finally, given timeout, Acker.status shall be 0 means OK otherwise something failed (may false-alarm, but never miss) 0
  • 13. 13© Copyright 2016 EMC Corporation. All rights reserved. • RandomNum + XOR based, the key foundation of Storm that runs for 5+Y – Smart, simple and pretty good! – Least memory footprint at Acker, regardless of topology – Reliable*, regardless of Ack traffic order – XOR op: commutative law, associative law • Easy to handle any Out-of-order Ingenious! *: in theory, random ID may collision
  • 14. 14© Copyright 2016 EMC Corporation. All rights reserved. • Network traffic, CPU overhead  latency & throughput impact – Possibility of random number collision Limitations 25000 msg/sec 9300 msg/sec Non-reliable processing reliable processing *3rd party benchmark in 2012, things may change now
  • 15. 15© Copyright 2016 EMC Corporation. All rights reserved. IS IT POSSIBLE ? Ack only at leaf? 0 1 2 3 4 5 6 7 8 9 Data source B C D E F G H I J K L M R Current algorithm is fantastic, however
  • 16. 16© Copyright 2016 EMC Corporation. All rights reserved. • Same-level guaranteed reliable processing • More scalable, efficient and fast – Much less Ack traffic; usually only at leaf nodes – Same memory footprint, less CPU usage – Eventually better latency/throughput 2 new proposed approaches Currently in research & quick validation phase
  • 17. 17© Copyright 2016 EMC Corporation. All rights reserved. • An evolution based on Random Num + XOR Approach-1: fingerprint based Currently, XOR in-pair (send, recv), then it’s 0 Further, XOR in multiple pairs (2, 4, 6, …), still 0
  • 18. 18© Copyright 2016 EMC Corporation. All rights reserved. • Fingerprint(FP): A digest (i.e., 8B) of {in msgs, out msgs and parent.fp}, to encode & represent the context then recursively pass- down. That each downstream inherits genes from all ancestors – Still use XOR of IDs, redundant in scalable way – 3-rule: Embedded, Recursively inherited and Append-only update Approach-1: fingerprint idea iMsg <Mj, FPj > Msg < Mj+1, FPj:i > Msg < Mj+2, FPj:i > Msg < Mj+2, FPj:i > Msg <…> Ni i+1 i+2 i+3 Ni+1 Ni+2 Ni+3 Pass-down FP InMsgID XOR [outMsgIDs] • Embedded: as part of metadata • Recursive-inherit: pass-down • Append-update: via XOR Append update
  • 19. 19© Copyright 2016 EMC Corporation. All rights reserved. Fingerprint example 0 1 2 3 4 FP0= R ⊕ A ⊕ B ⊕ C FP1= FP0 ⊕ A ⊕ D FP2= FP0 ⊕ B ⊕ E FP3= FP0 ⊕ C ⊕ F Leaf has 3 Ack traffic: FP4-D= FP1 ⊕ D FP4-E= FP2 ⊕ E FP4-F = FP3 ⊕ F  Acker.status = R ⊕ (FP0 ⊕ A ⊕ D) ⊕ D ⊕ (FP0 ⊕ B ⊕ E) ⊕ E ⊕ (FP0 ⊕ C ⊕ F) ⊕ F = Acker srcNodeID: RootMsgID, R A, FP0 C, FP0 B, FP0 D, FP1 E, FP2 F, FP3 FP4-D FP4-E FP4-F Init: R Calculate FP 0 R May batch
  • 20. 20© Copyright 2016 EMC Corporation. All rights reserved. Approach-1: failure example 0 1 2 3 4 Acker srcNodeID : RootMsgID, R A, FP0 C, FP0 B, FP0 D, FP1 E, FP2 F, FP3 FP4-D FP4-E FP4-F Init = R if msg D failed, then node4 only Ack FP4-E and FP4-F, finally Acker.status = = R ⊕ FP4-E ⊕ FP4-F = R ⊕ FP2 ⊕ E ⊕ FP3 ⊕ F = R ⊕ (FP ⊕ B ⊕ E ⊕ E) ⊕ (FP ⊕ C ⊕ F ⊕ F) = R ⊕ B ⊕ C != 0 Another example, if all message failed, Ack is R !=0 R  Missing info about A/D path, due to failure!!
  • 21. 21© Copyright 2016 EMC Corporation. All rights reserved. Approach-1: a complex example 1 2 3 4 5 6 7 8R A B C D E F G H I X Initial : R FP1= R ⊕ A ⊕ B ⊕ C FP2= FP1 ⊕ A ⊕ D FP3= FP1 ⊕ B ⊕ X FP4= FP1 ⊕ C ⊕ E //update FP5 to Acker since even number of downstreams (2) FP5= FP2 ⊕ D ⊕ FP3 ⊕ X ⊕ FP4 ⊕ E ⊕ (F ⊕ G) FP6= FP5 ⊕ F ⊕ H FP7= FP5 ⊕ G ⊕ I // blot8 sends FP8 to Acker FP8= FP6 ⊕ H ⊕ FP7 ⊕ I Final Status = R ⊕ FP5 ⊕ FP8 = R ⊕ FP5 ⊕ (FP5 ⊕ F) ⊕ (FP5 ⊕ G) = R ⊕ FP5 ⊕ (F ⊕ G) = R ⊕ FP2 ⊕ D ⊕ FP3 ⊕ X ⊕ FP4 ⊕ E = R ⊕ (FP1 ⊕ A ⊕ B ⊕ C ) = 0 Acker FP5 FP8 Limit and note: 1) downstream msg shall be odd number (1,3, 5, …); otherwise, bolt must send the new FP to Acker, where Acker would run XOR with the new FP; 2) To implement such approach, ideally bolt needs to know the total downstream number to generate FP before emit.
  • 22. 22© Copyright 2016 EMC Corporation. All rights reserved. • For input rootMsg, INIT a BIG SHARE (8B), EMBED as metadata, pass-down • SPLIT attached share by Storm at each bolt, EMBED, repeat this until leaf ... • Only leaf ACK to Acker about received share at hand • Acker REDO: decrease the reported share, finally 0 means ok; or-else failure – No random(no collision), no XOR; inline embedded; split is transparent to App – +/- (mod): follow commutative & associative law, resolve out-of-order issue Approach-2: share split 0 1 2 3 4 5 6 7 8 9 Acker srcNodeID: rootMsgID,BIG-Share A B, 50 C, 50 D, 25 E, 25 F, 17 G, 17 H, 16 I, 25 J, 25 K, 17 L, 17 A,1, 100 A, 0, 16 A, 0, 84M, 16 Like: IPO/stock share, split, increase share
  • 23. 23© Copyright 2016 EMC Corporation. All rights reserved. • Rare case: INCREASE share if insufficient to split (also syncup the Acker) • Acker then ADD the newly increased share (NOT decrease) Approach-2: share split (con’t) 0 1 2 3 4 5 6 7 8 9 Acker srcNodeID, RootMsgID,Share A B, 99 C, 1 F, 33 G, 33 H, 34 A, 100 A, +99 increase share; Sync-up Acker If S - S1 - S2 - … = Sn, then S - S1 - S2 - … - Sn = AckerDAG 0 (Ack may batch)
  • 24. 24© Copyright 2016 EMC Corporation. All rights reserved. • Implemented Approach-2 (share-split) • Integrate with Storm 1.0.1 (Released in May 2016) – Storm core (~200 LOC in Clojure: LISP-like) and Java APIs (~200 LOC including some traces/tests) • Implementation notes: – Support BasicBolt, remove randomNum, re-use some existing structures/APIs i.e., Anchors-to-ids (RootID:shareAttached), Ack sending – Global pre-defined split share at all bolts (equally split) • Next, configurable split approach per bolt – To exactly split share, build 1-step delay emit • Pre-split the input share • Once new tuple generated, emit internally queue it until next tuple come out • Finally explicitly call emitDone(), thus last tuple takes over all left share and emit Prototyping
  • 25. 25© Copyright 2016 EMC Corporation. All rights reserved. • Function & performance – network traffic, CPU, latency/throughput • Reference IBM whitepaper (Storm vs. IBM InfoSphere): 7 layers – We use Wikipedia as data source; words processing Benchmark 1000 Mbps Ubuntu 15.10 (4.2.0) Storm 1.0.1 Ubuntu 15.10 (4.2.0) Storm 1.0.1 E5-2643 @ 3.40GHz, 24 cores; 256GB DRAM E5-2643 @ 3.40GHz, 24 cores; 256GB DRAM Ubuntu 15.10 (4.2.0) Storm 1.0.1 E5-2643 @ 3.40GHz, 24 cores; 256GB DRAM
  • 26. 26© Copyright 2016 EMC Corporation. All rights reserved. • Function: Inject error and validate reliability detection: Pass – Same-level reliability as existing approach • • Performance: same HW/SW config and processing logic – 16KB tuple, 100 pending, 48 parallelism per bolt – 4 workers & 12 Ackers per host Result: function & performance
  • 27. 27© Copyright 2016 EMC Corporation. All rights reserved. • 1/3 Ack traffic, 18% faster, 9% less CPU Test1: 3 layers 3903 1301 Current New Ack traffic(Mil) 241 197 Current New End-end Latency(ms) 350% 320% Current New CPU (per Java worker)
  • 28. 28© Copyright 2016 EMC Corporation. All rights reserved. • 1/5 Ack traffic, 23% faster, 14% less CPU Test2: 7 layers 2685 537 Current New Ack traffic(Mil) 197 151 Current New End-end latency(ms) 250% 215% Current New CPU (per Java worker)
  • 29. 29© Copyright 2016 EMC Corporation. All rights reserved. • Larger topology? Quick test of 11 layers: – 1/9 traffic • Suppose the larger of topology, the more gains to achieve • Next – Refine multi-Acker – Implement “Increase Share” operation – Configurable split method per bolt • So Dev can specify desired split way rather than fixed/global • May integrate with Twitter Heron? Or apply to other areas? – i.e., function call graph? performance trace? (more…) MORE
  • 30. 30© Copyright 2016 EMC Corporation. All rights reserved. End-end IoT landscape Continuous, scalable, Real-time processing
  • 31. 31© Copyright 2016 EMC Corporation. All rights reserved. • Lambda architecture, fusion “historical ”+“new” data – Proposed by Nathan Marz (5y ago), batch + streaming – widely adopted in many Internet company Unified data processing
  • 32. 32© Copyright 2016 EMC Corporation. All rights reserved. • 2 innovative & inspiring streaming reliability algorithms – Guaranteed with minimized mem footprint – More scalable, efficient & fast, and even beautiful • Demonstrate in Storm – 1/N Ack traffic, only needed at leaf nodes • N is topology depth. Usually a few leaf for aggregation, DB saving etc • meanwhile, 23% faster, 14% less CPU – Transparent to App except the last explicit emitDone() call • Applying to other interesting areas... – Distributed replication, tx, exact-state tracking, … SUMMARY
  • 33. 33© Copyright 2016 EMC Corporation. All rights reserved. • Feedback or comments? talk with us! – Any flaw, constraints, or room to improve? – then discuss with Storm community; Codes can be shared if needed THANK YOU!

Notas do Editor

  1. Any official adeclaimer?
  2. May also known as Complex Event Processing (CEP)
  3. Trident: abstraction on top of Storm. Besides providing higher-level constructs “a-la-Cascading”, it batches groups of Tuples to 1) Make reasoning about processing easier and 2) Encourage efficient data persistence, even with the help of an API that can provide exactly-once semantics for some cases Heron: built since 2014, paper in 2015, open-source in May 2016. API compatible with Apache Storm and hence no code change “One of our primary requirements for Heron was ease of debugging and profiling”, also scheduling, optimal resource utilization (IPC layer, simplification) Flink: based on distributed ckpt, Lightweight Asynchronous Snapshots for Distributed Dataflows, (ABS: Asynchronous Barrier Snapshotting ) variation of the Chandy Lamport algorithm (1985). periodically draws state snapshots of a running stream topology, and stores these snapshots to durable storage Similar to the micro-batching approach, in which all computations between two checkpoints either succeed or fail atomically as a whole. However, the similarities stop there. One great feature of Chandy Lamport is that we never have to press the “pause” button in stream processing to schedule the next micro batch. Instead, regular data processing always keeps going, processing events as they come, while checkpoints happen in the background
  4. If failed detected, Storm can re-do from beginning (Storm doesn’t do ckpt) - usually fast in ms level. Spark can re-do from the most recent ckpt (perf impact).
  5. Task failed: by supervisor daemon restart Supervisor/workNode failed: by ZK Restart/re-scheduler Master failed: by ZK. Cant’ submit new task Existing task should be ok Redo(Re-compute): no log/replica, for high performance or real-time processing
  7. It doesn’t care which component is failed. Once failed is detected given a time-out (30sec) App should not commit the message to data source like Kafka, then Kafka never remove that data App could re-send the message and re-run the topology
  8. Random ID Every bolt must send a Ack message
  9. Another benchmark is IBM IBM InfoSphere vs. Storm:
  10. In practice, there’s a challenge to implement such approach: ideally, Need to know how many downstream msg are generated, then alloc enough random IDs and calculate the FP For each downstream msg, embed the FP and emit to downstream However, for many (may be not all) processing logic, probably don’t know the total downstream msg count beforehand (in step1) until execute the logic.
  11. In practice, there’s a challenge to implement the approach: ideally, Need to know how many downstream msg are generated, then alloc enough random IDs and calculate the FP For each downstream msg, embed the FP and emit to downstream However, for many (may be not all) processing logic, probably don’t know the total downstream msg count beforehand (in step1) until execute the logic.
  12. In practice, there’s a challenge to implement such approach: ideally, Need to know how many downstream msg are generated, then alloc enough random IDs and calculate the FP For each downstream msg, embed the FP and emit to downstream However, for many (may be not all) processing logic, probably don’t know the total downstream msg count beforehand (in step1) until execute the logic.
  13. For example, init share is 100 @ Acker. Embed the share into msg and pass-down to the downstream, A source msg (root msg) is ingested at root node (spout), then init the BIG SHARE as initial status. And embed the SHARE as part of metadata Run topology, and each node execute pre-defined logic, meanwhile, also abstract the share and split it to downstream outputMsg Finally at leaf nodes, would abstract and report the received share to Acker Acker would decrease the share, 100 - 16 -84 = 0. 0 means ok.
  14. May pre-define some rule about inc, i.e., always increase 7B, then Acker could use one bit to indicate one increase A similar but different Huang’s algorithm. looks both use number as weight or share then involve split op, but sounds to me, the problem area, prerequisite, algorithm steps are very different. Huang’s target is more related to process (task/bolt) state, but my target is the continuous flowing message running at tasks. A few bullets in my mind, feel free to comment: Problem area: In Huang’s context, the distributed task consists of different processes, each in either active (may idel at anytime) or idle (idle to active is only triggered by some msg). Huang’s goal is to detect *all processes* in the system become state idle. our goal is to track each message status running at those task or usually related to partial failure (but we don’t care which task is failed/unavailable) prerequisite: importantly, state of idle (Huang’s monitoring state) clearly *is explicit aware* by the process; with that, his step is “Upon becoming idle, a process sends a message… “; but in our case, message failure/exception is hard to know by itself, typically due to network partition/timeout etc, thus it must be detected by other components or special design state, which adds extra challenges. into the algorithm: steps are different, our method always split the number during flow the DAG, then the Acker essentially redo split op based on recv share and make sure redo result is 0.   In general,  Huang’s research target is process (tens or hundreds), rather than the continuous flowing message (billions or never stop). In practices, currently distributed process states are managed by Zookeeper(or raft etc) that based on Paxos algorithm publish in 1990-but widely understood and adopted after 2001 (until Lamport’s second paper to explain Paxos, and Google validated)
  15. A few important points in implementation: Re-use existing Anchors-to-ids map to embed the share when emit (so no extra traffic), previously it’s [RootId -> tupleID]; now it’s [RootID -> shareAssigned] 2. To split pass-down share, need to know how many downstream outMsg generated beforehand (but usually hard to predict), To resolve that, work out the 1-step defer processing: 1) Static split the input share = sub-share 2) Assign and embed, prepare to emit 3) Internally queue current outMsg, and send previous msg 4) ToEmit the last msg with new API, thus the last outMsg takes over all the left share w/ above implem, we introduce a little bit delay but it’s acceptable. 3. How to split share is also important. Right now it’s simply pre-defined split method, i.e., all bolt uses a pre-defined split count (could be 1 ~ 4096 or larger); in the future, it shall be config per bolt by Dev (who suppose knows more on the topology). i.e., bolt1 may split upto 128; bolt2 may split as 256 etc. Improper split may cause pressure to run out of ID thus increase share is required - still depends on topology size
  16. IBM InfoSphere vs. Storm:
  17. Various topology, such as top-down, multiple income bolt, multiple spouts, …
  18. CLJ source files storm-core/src/clj/org/apache/storm/daemon/acker.clj executor.clj storm-core/src/clj/org/apache/storm/util.clj Java source files storm-core/src/jvm/org/apache/storm/topology/ storm-core/src/jvm/org/apache/storm/coordination/ storm-core/src/jvm/org/apache/storm/task/ storm-core/src/jvm/org/apache/storm/trident/topology/