3. Real-time data processing
before Twitter Storm:
network of queues and workers
2
Wednesday, October 3, 12
4. Real-time data processing
before Twitter Storm:
network of queues and workers
MESSAGES QUEUE
2
Wednesday, October 3, 12
5. Real-time data processing
before Twitter Storm:
network of queues and workers
Message
routing can be
complex!
MESSAGES QUEUE
2
Wednesday, October 3, 12
7. Real-time data processing
Queues replication is
needed for reliability MESSAGES
QUEUE
MESSAGES QUEUE
MESSAGES QUEUE
3
Wednesday, October 3, 12
8. Real-time data processing
Queues replication is
needed for reliability MESSAGES
QUEUE
MESSAGES QUEUE
Hard to maintain
queues
MESSAGES QUEUE
3
Wednesday, October 3, 12
9. Real-time data processing
Queues replication is
needed for reliability MESSAGES
QUEUE
Each new
computation branch
requires routing
MESSAGES QUEUE reconfiguration
Hard to maintain
queues
MESSAGES QUEUE
3
Wednesday, October 3, 12
17. (Very) basic info
created by Nathan Marz from Backtype/Twitter;
5
Wednesday, October 3, 12
18. (Very) basic info
created by Nathan Marz from Backtype/Twitter;
Eclipse Public License 1.0;
5
Wednesday, October 3, 12
19. (Very) basic info
created by Nathan Marz from Backtype/Twitter;
Eclipse Public License 1.0;
open sourced at September 19th, 2011;
5
Wednesday, October 3, 12
20. (Very) basic info
created by Nathan Marz from Backtype/Twitter;
Eclipse Public License 1.0;
open sourced at September 19th, 2011;
about 16k Java and 7k Clojure LoC;
5
Wednesday, October 3, 12
21. (Very) basic info
created by Nathan Marz from Backtype/Twitter;
Eclipse Public License 1.0;
open sourced at September 19th, 2011;
about 16k Java and 7k Clojure LoC;
most watched Java repo at Github (> 4k watchers);
5
Wednesday, October 3, 12
22. (Very) basic info
created by Nathan Marz from Backtype/Twitter;
Eclipse Public License 1.0;
open sourced at September 19th, 2011;
about 16k Java and 7k Clojure LoC;
most watched Java repo at Github (> 4k watchers);
active UG.
5
Wednesday, October 3, 12
24. Current status
current stable release: 0.8.1;
6
Wednesday, October 3, 12
25. Current status
current stable release: 0.8.1;
0.8.2 with small bug fixes is already on the way;
6
Wednesday, October 3, 12
26. Current status
current stable release: 0.8.1;
0.8.2 with small bug fixes is already on the way;
0.9.0 with major core improvements is planned;
6
Wednesday, October 3, 12
27. Current status
current stable release: 0.8.1;
0.8.2 with small bug fixes is already on the way;
0.9.0 with major core improvements is planned;
not very active contributions, we can try to get into;
6
Wednesday, October 3, 12
28. Current status
current stable release: 0.8.1;
0.8.2 with small bug fixes is already on the way;
0.9.0 with major core improvements is planned;
not very active contributions, we can try to get into;
used by over 30 companies (such as Twitter, Groupon,
Alibaba, GumGum, etc).
6
Wednesday, October 3, 12
30. Key properties
extremely broad set of use cases:
streams processing;
database updating;
distributed rpc;
7
Wednesday, October 3, 12
31. Key properties
extremely broad set of use cases:
streams processing;
database updating;
distributed rpc;
scalable and extremely robust;
7
Wednesday, October 3, 12
32. Key properties
extremely broad set of use cases:
streams processing;
database updating;
distributed rpc;
scalable and extremely robust;
guarantees no data loss;
7
Wednesday, October 3, 12
33. Key properties
extremely broad set of use cases:
streams processing;
database updating;
distributed rpc;
scalable and extremely robust;
guarantees no data loss;
fault-tolerant;
7
Wednesday, October 3, 12
34. Key properties
extremely broad set of use cases:
streams processing;
database updating;
distributed rpc;
scalable and extremely robust;
guarantees no data loss;
fault-tolerant;
programming language agnostic.
7
Wednesday, October 3, 12
35. Key concepts
Tuples (ordered list of elements)
8
Wednesday, October 3, 12
36. Key concepts
Tuples (ordered list of elements)
( “Saratov”, “slukjanov”, “event1”, “10/3/12 16:20”)
8
Wednesday, October 3, 12
37. Key concepts
Streams (unbounded sequence of tuples)
9
Wednesday, October 3, 12
38. Key concepts
Streams (unbounded sequence of tuples)
TUPLE TUPLE TUPLE TUPLE TUPLE
9
Wednesday, October 3, 12
39. Key concepts
Spouts (source of streams)
10
Wednesday, October 3, 12
40. Key concepts
Spouts (source of streams)
TUPLE TUPLE TUPLE TUPLE TUPLE
10
Wednesday, October 3, 12
41. Key concepts
Spouts (source of streams)
TUPLE TUPLE TUPLE TUPLE TUPLE
Spouts can talk with:
queues;
logs;
API calls;
event data.
10
Wednesday, October 3, 12
42. Key concepts
Bolts (process tuples and create new streams)
11
Wednesday, October 3, 12
43. Key concepts
Bolts (process tuples and create new streams)
LE
TUP
LE
TUP
LE
TUP
LE
TUP
LE
TUP
TUPLE TUPLE TUPLE TUPLE TUPLE
TUP
LE
TUP
LE
TUP
LE
TUP
LE
TUP
LE
11
Wednesday, October 3, 12
44. Key concepts
You can do the following things in Bolts:
apply functions / transformations;
filter;
aggregation;
streaming joins;
access DBs, APIs, etc...
12
Wednesday, October 3, 12
45. Key concepts
Topologies (a directed graph of Spouts and Bolts)
13
Wednesday, October 3, 12
46. Key concepts
Topologies (a directed graph of Spouts and Bolts)
LE
TUP
LE
TUP
LE
TUP
LE
TUP
LE
TUP
TUPLE TUPLE TUPLE TUPLE TUPLE
TUP
LE
TUP
LE
TUP
LE
TUP
LE
TUP
L
13
Wednesday, October 3, 12
47. Key concepts
Topologies (a directed graph of Spouts and Bolts)
TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE
LE LE
TUP TUP
LE LE
TUP TUP
LE LE
TUP TUP
LE LE
TUP TUP
LE LE
TUP TUP
TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE TUPLE
14
Wednesday, October 3, 12
48. Key concepts
Tasks (instances of spouts and bolts)
15
Wednesday, October 3, 12
49. Key concepts
Tasks (instances of spouts and bolts)
Task 1
Task 2
Task 3
Task 4
15
Wednesday, October 3, 12
61. Grouping
shuffle (randomly and evenly distributed);
18
Wednesday, October 3, 12
62. Grouping
shuffle (randomly and evenly distributed);
local or shuffle (local workers are preferred);
18
Wednesday, October 3, 12
63. Grouping
shuffle (randomly and evenly distributed);
local or shuffle (local workers are preferred);
fields (the stream is partitioned by specified fields);
18
Wednesday, October 3, 12
64. Grouping
shuffle (randomly and evenly distributed);
local or shuffle (local workers are preferred);
fields (the stream is partitioned by specified fields);
all (the stream is replicated across all the bolt’s tasks);
18
Wednesday, October 3, 12
65. Grouping
shuffle (randomly and evenly distributed);
local or shuffle (local workers are preferred);
fields (the stream is partitioned by specified fields);
all (the stream is replicated across all the bolt’s tasks);
global (the entire stream goes to a single bolt’s task);
18
Wednesday, October 3, 12
66. Grouping
shuffle (randomly and evenly distributed);
local or shuffle (local workers are preferred);
fields (the stream is partitioned by specified fields);
all (the stream is replicated across all the bolt’s tasks);
global (the entire stream goes to a single bolt’s task);
direct (producers could directly emit tuples);
18
Wednesday, October 3, 12
67. Grouping
shuffle (randomly and evenly distributed);
local or shuffle (local workers are preferred);
fields (the stream is partitioned by specified fields);
all (the stream is replicated across all the bolt’s tasks);
global (the entire stream goes to a single bolt’s task);
direct (producers could directly emit tuples);
custom (implement interface CustomStreamGrouping).
18
Wednesday, October 3, 12
75. WordCount sample
SENTENCE SENTENCE
GENERATOR SPLITTER
SENTENCE SENTENCE WORD WORD
20
Wednesday, October 3, 12
76. WordCount sample
SENTENCE SENTENCE WORD
GENERATOR SPLITTER COUNTER
SENTENCE SENTENCE WORD WORD
20
Wednesday, October 3, 12
77. WordCount sample
SENTENCE SENTENCE GROUP WORD
GENERATOR SPLITTER BY WORD COUNTER
SENTENCE SENTENCE WORD WORD
20
Wednesday, October 3, 12
78. WordCount sample
SENTENCE SENTENCE GROUP WORD
GENERATOR SPLITTER BY WORD COUNTER
SENTENCE SENTENCE WORD WORD
PING
PING
PING
GENERATOR
20
Wednesday, October 3, 12
79. WordCount sample
SENTENCE SENTENCE GROUP WORD
GENERATOR SPLITTER BY WORD COUNTER
SENTENCE SENTENCE WORD WORD
PING
PING
PING
GENERATOR
SOUT
20
Wednesday, October 3, 12
80. WordCount sample
SENTENCE SENTENCE GROUP WORD
GENERATOR SPLITTER BY WORD COUNTER
SENTENCE SENTENCE WORD WORD
PING
PING
PING
GENERATOR
DB
20
Wednesday, October 3, 12
81. Sentence generator
public class RandSentenceGenerator extends BaseRichSpout {
private SpoutOutputCollector collector;
private Random random;
private String[] sentences;
@Override
public void open(Map map, TopologyContext ctx, SpoutOutputCollector collector) {
this.collector = collector;
this.random = new Random();
this.sentences = <sentences array>;
}
@Override
public void nextTuple() {
Utils.sleep(10);
String sentence = sentences[random.nextInt(sentences.length)];
collector.emit(new Values(sentence));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("sentence"));
}
}
21
Wednesday, October 3, 12
82. Sentence splitter
public class SplitSentence extends BaseBasicBolt {
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String sentence = tuple.getString(0);
for (String word : sentence.split("s")) {
collector.emit(new Values(word));
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
22
Wednesday, October 3, 12
83. Word count
public class WordCount extends BaseBasicBolt {
private HashMultiset<String> words = HashMultiset.create();
@Override
public void prepare(Map conf, TopologyContext ctx) {
super.prepare(conf, ctx);
this.logger = Logger.getLogger(this.getClass());
this.name = ctx.getThisComponentId();
this.task = ctx.getThisTaskIndex();
}
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String source = tuple.getSourceComponent();
if ("split".equals(source)) {
words.add(tuple.getString(0));
} else if ("ping".equals(source)) {
logger.warn("RESULT " + name + ":" + task + " :: " + words);
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
23
Wednesday, October 3, 12
84. Topology builder
public class WordCounter {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("source", new RandSentenceGenerator(), 3);
builder.setSpout("ping", new PingSpout());
builder.setBolt("split", new SplitSentence(), 8)
.shuffleGrouping("source");
builder.setBolt("count", new WordCount(), 12)
.fieldsGrouping("split", new Fields("word"))
.allGrouping("ping");
<topology submitting>
}
}
24
Wednesday, October 3, 12
85. Topology submitter
public class WordCounter {
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
<building topology>
Config conf = new Config();
conf.setDebug(true);
conf.setNumWorkers(3);
StormSubmitter.submitTopology("tplg-name", conf, builder.createTopology());
}
}
25
Wednesday, October 3, 12
87. Multilang support
DSLs for Scala, JRuby and Clojure;
26
Wednesday, October 3, 12
88. Multilang support
DSLs for Scala, JRuby and Clojure;
ShellSpout, ShellBolt;
26
Wednesday, October 3, 12
89. Multilang support
DSLs for Scala, JRuby and Clojure;
ShellSpout, ShellBolt;
json-based protocol:
receive/emit tuples;
ack/fail tuples;
write to logs.
26
Wednesday, October 3, 12
99. Storm fault-tolerance
Parts of Storm cluster:
Zookeeper nodes;
Nimbus (master) node;
Supervisor nodes.
29
Wednesday, October 3, 12
100. Nimbus as a point of failure
30
Wednesday, October 3, 12
101. Nimbus as a point of failure
when Nimbus is down:
topologies continue to work;
tasks from failing nodes aren’t respawned;
can’t upload a new topology or rebalance an old one;
30
Wednesday, October 3, 12
102. Nimbus as a point of failure
when Nimbus is down:
topologies continue to work;
tasks from failing nodes aren’t respawned;
can’t upload a new topology or rebalance an old one;
impossible to run Nimbus at another node:
either fix the failed node;
or create new and resubmit all topologies.
30
Wednesday, October 3, 12
103. Tuple types
spout tuple - emitted from Spouts;
child tuple - emitted from Bolts, based on parent
tuple(s) (child or spout ones).
31
Wednesday, October 3, 12
104. Tuple types
spout tuple - emitted from Spouts;
child tuple - emitted from Bolts, based on parent
tuple(s) (child or spout ones).
[“the”] [“the”, 1]
[“cow”] [“cow”, 1]
[“the cow jumped [“jumped”] [“jumped”, 1]
over the moon”]
[“over”] [“over”, 1]
[“the”] [“the”, 2]
[“moon”] [“moon”, 1]
31
Wednesday, October 3, 12
105. Reliability API Guaranties
public class QueueConsumer extends BaseRichSpout {
...
@Override
public void nextTuple() {
Message msg = queueClient.popMessage();
collector.emit(msg.getPayload(), msg.getId());
}
@Override
public void ack(Object msgId) {
queueClient.ack(msgId);
}
@Override
public void fail(Object msgId) {
queueClient.fail(msgId);
}
...
}
32
Wednesday, October 3, 12
112. Disabling reliability API
globally:
Config.TOPOLOGY_ACKER_EXECUTORS = 0;
34
Wednesday, October 3, 12
113. Disabling reliability API
globally:
Config.TOPOLOGY_ACKER_EXECUTORS = 0;
on topology level:
collector.emit(values, msgId);
34
Wednesday, October 3, 12
114. Disabling reliability API
globally:
Config.TOPOLOGY_ACKER_EXECUTORS = 0;
on topology level:
collector.emit(values, msgId);
for a single tuple:
collector.emit(parentTuples, values);
34
Wednesday, October 3, 12
115. Acker system impl
every tuple is assigned a random 64-bit ID
35
Wednesday, October 3, 12
116. Acker system impl
every tuple is assigned a random 64-bit ID
[1] [2] [3]
Spout Bolt A Bolt B Bolt C
35
Wednesday, October 3, 12
117. Acker system impl
every tuple is assigned a random 64-bit ID
[1] [2] [3]
Spout Bolt A Bolt B Bolt C
[1] emit
35
Wednesday, October 3, 12
118. Acker system impl
every tuple is assigned a random 64-bit ID
[1] [2] [3]
Spout Bolt A Bolt B Bolt C
[1] emit
[2] emit
35
Wednesday, October 3, 12
119. Acker system impl
every tuple is assigned a random 64-bit ID
[1] [2] [3]
Spout Bolt A Bolt B Bolt C
[1] emit
[2] emit
[1] ack
35
Wednesday, October 3, 12
120. Acker system impl
every tuple is assigned a random 64-bit ID
[1] [2] [3]
Spout Bolt A Bolt B Bolt C
[1] emit
[2] emit
[1] ack
[3] emit
35
Wednesday, October 3, 12
121. Acker system impl
every tuple is assigned a random 64-bit ID
[1] [2] [3]
Spout Bolt A Bolt B Bolt C
[1] emit
[2] emit
[1] ack
[3] emit
[2] ack
35
Wednesday, October 3, 12
122. Acker system impl
every tuple is assigned a random 64-bit ID
[1] [2] [3]
Spout Bolt A Bolt B Bolt C
[1] emit
[2] emit
[1] ack
[3] emit
[2] ack
[3] ack
35
Wednesday, October 3, 12
128. Correctness of the tracking
bolt fails before sending ack for a tuple:
no ack arrive before timeout, spout tuple fails;
38
Wednesday, October 3, 12
129. Correctness of the tracking
bolt fails before sending ack for a tuple:
no ack arrive before timeout, spout tuple fails;
acker fails before acking tuple tree processing:
-- the same as above --;
38
Wednesday, October 3, 12
130. Correctness of the tracking
bolt fails before sending ack for a tuple:
no ack arrive before timeout, spout tuple fails;
acker fails before acking tuple tree processing:
-- the same as above --;
spout fails before acking message:
the message source should handle client’s death.
38
Wednesday, October 3, 12
132. Reliability API - Conclusion
easy to dismiss:
on message - at most one processing;
39
Wednesday, October 3, 12
133. Reliability API - Conclusion
easy to dismiss:
on message - at most one processing;
if using, little overhead and high durability:
one message - at least one processing;
39
Wednesday, October 3, 12
134. Reliability API - Conclusion
easy to dismiss:
on message - at most one processing;
if using, little overhead and high durability:
one message - at least one processing;
with some further work (transactions, Trident API):
one message - exactly one processing.
39
Wednesday, October 3, 12
137. Transactional approach: design #1
MESSAGE TUPLE COMMIT
Spout Bolt A
input provides messages in strong order;
40
Wednesday, October 3, 12
138. Transactional approach: design #1
MESSAGE TUPLE COMMIT
Spout Bolt A
input provides messages in strong order;
each message is assigned Transaction ID;
40
Wednesday, October 3, 12
139. Transactional approach: design #1
MESSAGE TUPLE COMMIT
Spout Bolt A
input provides messages in strong order;
each message is assigned Transaction ID;
if (curr_tx_id > prev_tx_id) commit(result, curr_tx_id).
40
Wednesday, October 3, 12
140. Transactional approach: design #2
BATCH OF BATCH
MESSAGES OF TUPLES COMMIT
Spout Bolt A
input provides messages in strong order;
each batch of messages is assigned Transaction ID;
if (curr_tx_id > prev_tx_id) commit(result, curr_tx_id).
41
Wednesday, October 3, 12
141. Transactional approach: design #3
the same as #2, but each transaction is split:
processing phase;
commit phase;
42
Wednesday, October 3, 12
142. Transactional approach: design #3
the same as #2, but each transaction is split:
processing phase;
commit phase;
process phases might intersect for difference
transactions;
42
Wednesday, October 3, 12
143. Transactional approach: design #3
the same as #2, but each transaction is split:
processing phase;
commit phase;
process phases might intersect for difference
transactions;
commit phases go in strong order.
42
Wednesday, October 3, 12
145. Trident API: Intro
high-level abstraction for doing realtime computations;
43
Wednesday, October 3, 12
146. Trident API: Intro
high-level abstraction for doing realtime computations;
high throughput (millions of messages per second);
43
Wednesday, October 3, 12
147. Trident API: Intro
high-level abstraction for doing realtime computations;
high throughput (millions of messages per second);
stateful stream processing;
43
Wednesday, October 3, 12
148. Trident API: Intro
high-level abstraction for doing realtime computations;
high throughput (millions of messages per second);
stateful stream processing;
low latency distributed querying;
43
Wednesday, October 3, 12
149. Trident API: Intro
high-level abstraction for doing realtime computations;
high throughput (millions of messages per second);
stateful stream processing;
low latency distributed querying;
different semantics (including exactly-once one);
43
Wednesday, October 3, 12
150. Trident API: Intro
high-level abstraction for doing realtime computations;
high throughput (millions of messages per second);
stateful stream processing;
low latency distributed querying;
different semantics (including exactly-once one);
smth. like Pig or Cascading.
43
Wednesday, October 3, 12
157. Trident API: Demo
TridentTopology topology = new TridentTopology();
TridentState wordCounts =
topology.newStream("spout1", new FixedBatchSpout())
.each(new Fields("sentence"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(), new Count(),
new Fields("count"))
.parallelismHint(6);
topology.newDRPCStream("words")
.each(new Fields("args"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.stateQuery(wordCounts, new Fields("word"), new MapGet(), new Fields("count"))
.each(new Fields("count"), new FilterNull())
.aggregate(new Fields("count"), new Sum(), new Fields("sum"));
Config config = new Config();
config.setMaxSpoutPending(100);
cluster.submitTopology("word-count-tplg", config, topology.build());
DRPCClient client = new DRPCClient("drpc.server.host", 3772);
System.out.println(client.execute("words", "cat dog the man"));
System.out.println(client.execute("words", "cat"));
// prints the JSON-encoded result, e.g.: "[[5078]]"
45
Wednesday, October 3, 12