SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Storm
Anatomy
Eiichiro Uchiumi
http://www.eiichiro.org/
About Me
Eiichiro Uchiumi
• A solutions architect at
working in emerging enterprise
technologies
- Cloud transformation
- Enterprise mobility
- Information optimization (big data)
https://github.com/eiichiro
@eiichirouchiumi
http://www.facebook.com/
eiichiro.uchiumi
What is Stream Processing?
Stream processing is a technical paradigm to process
big volume unbound sequence of tuples in realtime
• Algorithmic trading
• Sensor data monitoring
• Continuous analytics
= Stream
Source Stream Processor
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at BackType (acquired by Twitter)
- Written in Java and Clojure
Conceptual View
Bolt
Bolt
Bolt
Bolt
BoltSpout
Spout
Bolt:
Consumer of streams does some processing
and possibly emits new tuples
Spout:
Source of streams
Stream:
Unbound sequence of tuples
Tuple
Tuple:
List of name-value pair
Topology: Graph of computation composed of spout/bolt as the node and stream as the edge
Tuple
Tuple
Physical View
SupervisorNimbus
Worker
* N
Worker
Executor
* N
Task
* N
Supervisor
Supervisor
ZooKeeper
Supervisor
Supervisor
ZooKeeper
ZooKeeper Worker
Nimbus:
Master daemon process
responsible for
• distributing code
• assigning tasks
• monitoring failures
ZooKeeper:
Storing cluster operational state
Supervisor:
Worker daemon process listening for
work assigned its node
Worker:
Java process
executes a subset
of topology
Worker node
Worker process
Executor:
Java thread spawned
by worker runs on
one or more tasks of
the same component
Task:
Component (spout/
bolt) instance
performs the actual
data processing
Spout
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;
public class RandomSentenceSpout extends BaseRichSpout {
! SpoutOutputCollector collector;
! Random random;
!
! @Override
! public void open(Map conf, TopologyContext context,
! ! ! SpoutOutputCollector collector) {
! ! this.collector = collector;
! ! random = new Random();
! }
! @Override
! public void nextTuple() {
! ! String[] sentences = new String[] {
! ! ! ! "the cow jumped over the moon",
! ! ! ! "an apple a day keeps the doctor away",
! ! ! ! "four score and seven years ago",
! ! ! ! "snow white and the seven dwarfs",
! ! ! ! "i am at two with nature"
! ! };
! ! String sentence = sentences[random.nextInt(sentences.length)];
! ! collector.emit(new Values(sentence));
! }
Spout
! @Override
! public void open(Map conf, TopologyContext context,
! ! ! SpoutOutputCollector collector) {
! ! this.collector = collector;
! ! random = new Random();
! }
! @Override
! public void nextTuple() {
! ! String[] sentences = new String[] {
! ! ! ! "the cow jumped over the moon",
! ! ! ! "an apple a day keeps the doctor away",
! ! ! ! "four score and seven years ago",
! ! ! ! "snow white and the seven dwarfs",
! ! ! ! "i am at two with nature"
! ! };
! ! String sentence = sentences[random.nextInt(sentences.length)];
! ! collector.emit(new Values(sentence));
! }
! @Override
! public void declareOutputFields(OutputFieldsDeclarer declarer) {
! ! declarer.declare(new Fields("sentence"));
! }
@Override
public void ack(Object msgId) {}
@Override
public void fail(Object msgId) {}
}
Bolt
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class SplitSentenceBolt extends BaseRichBolt {
! OutputCollector collector;
!
! @Override
! public void prepare(Map stormConf, TopologyContext context,
! ! ! OutputCollector collector) {
! ! this.collector = collector;
! }
! @Override
! public void execute(Tuple input) {
! ! for (String s : input.getString(0).split("s")) {
! ! ! collector.emit(new Values(s));
! ! }
! }
! @Override
! public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
! }
}
Topology
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields;
public class WordCountTopology {
! public static void main(String[] args) throws Exception {
! ! TopologyBuilder builder = new TopologyBuilder();
! ! builder.setSpout("sentence", new RandomSentenceSpout(), 2);
! ! builder.setBolt("split", new SplitSentenceBolt(), 4)
! ! ! ! .shuffleGrouping("sentence")
! ! ! ! .setNumTasks(8);
! ! builder.setBolt("count", new WordCountBolt(), 6)
! ! ! ! .fieldsGrouping("split", new Fields("word"));
! !
! ! Config config = new Config();
! ! config.setNumWorkers(4);
! !
! ! StormSubmitter.submitTopology("wordcount", config, builder.createTopology());
! !
! ! // Local testing
//! ! LocalCluster cluster = new LocalCluster();
//! ! cluster.submitTopology("wordcount", config, builder.createTopology());
//! ! Thread.sleep(10000);
//! ! cluster.shutdown();
! }
!
}
Starting Topology
Nimbus
Thrift server
ZooKeeperStormSubmitter
> bin/storm jar
Uploads topology JAR to
Nimbus’ inbox with
dependencies
Submits topology
configuration as JSON
and structure as Thrift
Copies topology JAR,
configuration and structure
into local file system
Sets up static information
for topology
Makes assignment
Starts topology
Starting Topology
ZooKeeper Executor
Task
Worker
Supervisor
Nimbus
Thrift server
Downloads topology
JAR, configuration and
structure
Writes assignment on its
node into local file system
Starts worker based on
the assignment
Refreshes connections
Makes executors
Makes tasks
Starts processing
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at BackType (acquired by Twitter)
- Written in Java and Clojure
Extremely Significant Performance
Parallelism
RandomSentence
Spout
SplitSentence
Bolt
WordCount
Bolt
Parallelism
hint = 2
Parallelism
hint = 4
Parallelism
hint = 6
Number of
tasks = Not
specified =
Same as
parallelism
hint = 2
Number of
tasks = 8
Number of
tasks = Not
specified
= 6
Number of topology worker = 4
Number of worker slots / node = 4
Number of worker nodes = 2
Number of executor threads
= 2 + 4 + 6 = 12
Number of component instances
= 2 + 8 + 6 = 16
Worker node
Worker node
Worker process
Worker process
SS
Bolt
WC
Bolt
RS
Spout
SS
Bolt
SS
Bolt
WC
Bolt
RS
Spout
SS
Bolt
SS
Bolt
WC
Bolt
SS
Bolt
WC
Bolt
SS
Bolt
WC
Bolt
SS
Bolt
WC
Bolt
Executor thread
Topology can be spread out manually without downtime
when a worker node is added
Message Passing
Worker process
Executor
Executor Transfer
thread
Executor
Receive
thread
From other
workers
To other
workers
Receiver queue
Transfer queue
Internal transfer queue
Interprocess communication is mediated by ZeroMQ
Outside transfer is done with Kryo serialization
Local communication is mediated by LMAX Disruptor
Inside transfer is done with no serialization
LMAX Disruptor
• Consumer can easily
keep up with
producer by batching
• CPU cache friendly
- The ring is implemented as
an array, so the entries can
be preloaded
• GC safe
- The entries are preallocated
up front and live forever
Large concurrent
magic ring buffer
can be used like
blocking queue
Producer
Consumer
6 million orders per second can be processed
on a single thread at LMAX
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at BackType (acquired by Twitter)
- Written in Java and Clojure
Fault-tolerance
Cluster works normally
ZooKeeper WorkerSupervisorNimbus
Monitoring
cluster state
Synchronizing
assignment
Sending heartbeat
Reading worker
heartbeat from
local file system
Sending executor heartbeat
Fault-tolerance
Nimbus goes down
ZooKeeper WorkerSupervisorNimbus
Synchronizing
assignment
Sending heartbeat
Reading worker
heartbeat from
local file system
Sending executor heartbeat
Monitoring
cluster state
Processing will still continue. But topology lifecycle operations
and reassignment facility are lost
Fault-tolerance
Worker node goes down
ZooKeeper WorkerSupervisorNimbus
Monitoring
cluster state
Synchronizing
assignment
Sending heartbeat
Reading worker
heartbeat from
local file system
Sending executor heartbeat
WorkerSupervisor
Nimbus will reassign the tasks to other machines
and the processing will continue
Fault-tolerance
Supervisor goes down
ZooKeeper WorkerSupervisorNimbus
Monitoring
cluster state
Synchronizing
assignment
Sending heartbeat
Reading worker
heartbeat from
local file system
Sending executor heartbeat
Processing will still continue. But assignment is
never synchronized
Fault-tolerance
Worker process goes down
ZooKeeper WorkerSupervisorNimbus
Monitoring
cluster state
Synchronizing
assignment
Sending heartbeat
Reading worker
heartbeat from
local file system
Sending executor heartbeat
Supervisor will restart the worker process
and the processing will continue
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at BackType (acquired by Twitter)
- Written in Java and Clojure
Reliability API
public class RandomSentenceSpout extends BaseRichSpout {
! public void nextTuple() {
! ! ...;
! ! UUID msgId = getMsgId();
! ! collector.emit(new Values(sentence), msgId);
! }
public void ack(Object msgId) {
! // Do something with acked message id.
}
public void fail(Object msgId) {
! // Do something with failed message id.
}
}
public class SplitSentenceBolt extends BaseRichBolt {
! public void execute(Tuple input) {
! ! for (String s : input.getString(0).split("s")) {
! ! ! collector.emit(input, new Values(s));
! ! }
! !
! ! collector.ack(input);
! }
}
"the"
"the cow jumped
over the moon"
"cow"
"jumped"
"over"
"the"
"moon"
Emitting tuple
with message id
Anchoring incoming tuple
to outgoing tuples
Sending ack
Tuple tree
Acking Framework
SplitSentence
Bolt
RandomSentence
Spout
WordCount
Bolt
Acker
implicit bolt
Acker ack
Acker fail
Acker init
Acker implicit bolt
Tuple A
Tuple C
Tuple B
64 bit number called “Ack val”Spout tuple id Spout task id
Ack val has become 0, Acker implicit bolt knows
the tuple tree has been completed
Acker ack
Acker fail
• Emitted tuple A, XOR tuple A id with ack val
• Emitted tuple B, XOR tuple B id with ack val
• Emitted tuple C, XOR tuple C id with ack val
• Acked tuple A, XOR tuple A id with ack val
• Acked tuple B, XOR tuple B id with ack val
• Acked tuple C, XOR tuple C id with ack val
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at BackType (acquired by Twitter)
- Written in Java and Clojure
Cluster Setup
• Setup ZooKeeper cluster
• Install dependencies on Nimbus and worker
machines
- ZeroMQ 2.1.7 and JZMQ
- Java 6 and Python 2.6.6
- unzip
• Download and extract a Storm release to Nimbus
and worker machines
• Fill in mandatory configuration into storm.yaml
• Launch daemons under supervision using “storm”
script
Cluster Summary
Topology Summary
Component Summary
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at BackType (acquired by Twitter)
- Written in Java and Clojure
Basic Resources
• Storm is available at
- http://storm-project.net/
- https://github.com/nathanmarz/storm
under Eclipse Public License 1.0
• Get help on
- http://groups.google.com/group/storm-user
- #storm-user freenode room
• Follow
- @stormprocessor and @nathanmarz
for updates on the project
Many Contributions
• Community repository for modules to use Storm at
- https://github.com/nathanmarz/storm-contrib
including integration with Redis, Kafka, MongoDB,
HBase, JMS, Amazon SQS and so on
• Good articles for understanding Storm internals
- http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-
topology/
- http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-
buffers/
• Good slides for understanding real-life examples
- http://www.slideshare.net/DanLynn1/storm-as-deep-into-realtime-data-processing-as-you-
can-get-in-30-minutes
- http://www.slideshare.net/KrishnaGade2/storm-at-twitter
Features on Deck
• Current release: 0.8.2 as of 6/28/2013
• Work in progress (older): 0.8.3-wip3
- Some bug fixes
• Work in progress (newest): 0.9.0-wip19
- SLF4J and Logback
- Pluggable tuple serialization and blowfish encryption
- Pluggable interprocess messaging and Netty implementation
- Some bug fixes
- And more
Advanced Topics
• Distributed RPC
• Transactional topologies
• Trident
• Using non-JVM languages with Storm
• Unit testing
• Patterns
...Not described in this presentation. So check
these out by yourself, or my upcoming session if a
chance is given :)
Thank You

Mais conteúdo relacionado

Mais procurados

Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleDataWorks Summit/Hadoop Summit
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridDataWorks Summit
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Stormthe100rabh
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationUday Vakalapudi
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing FrameworksSirKetchup
 

Mais procurados (19)

Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Storm
StormStorm
Storm
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing Frameworks
 

Semelhante a Storm Anatomy

Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesOleksii Diagiliev
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopPiotr Turek
 
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation systemBWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation systemAndrii Gakhov
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyAerospike
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Anubhav Jain
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
An Introduction to Go
An Introduction to GoAn Introduction to Go
An Introduction to GoCloudflare
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based MonitoringDatadog
 
Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoringDatadog
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Databricks
 
Kotlin from-scratch 3 - coroutines
Kotlin from-scratch 3 - coroutinesKotlin from-scratch 3 - coroutines
Kotlin from-scratch 3 - coroutinesFranco Lombardo
 

Semelhante a Storm Anatomy (20)

Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
 
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation systemBWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war story
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Storm
StormStorm
Storm
 
Storm
StormStorm
Storm
 
An Introduction to Go
An Introduction to GoAn Introduction to Go
An Introduction to Go
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based Monitoring
 
Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoring
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Kotlin from-scratch 3 - coroutines
Kotlin from-scratch 3 - coroutinesKotlin from-scratch 3 - coroutines
Kotlin from-scratch 3 - coroutines
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Storm Anatomy

  • 2. About Me Eiichiro Uchiumi • A solutions architect at working in emerging enterprise technologies - Cloud transformation - Enterprise mobility - Information optimization (big data) https://github.com/eiichiro @eiichirouchiumi http://www.facebook.com/ eiichiro.uchiumi
  • 3. What is Stream Processing? Stream processing is a technical paradigm to process big volume unbound sequence of tuples in realtime • Algorithmic trading • Sensor data monitoring • Continuous analytics = Stream Source Stream Processor
  • 4. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 5. Conceptual View Bolt Bolt Bolt Bolt BoltSpout Spout Bolt: Consumer of streams does some processing and possibly emits new tuples Spout: Source of streams Stream: Unbound sequence of tuples Tuple Tuple: List of name-value pair Topology: Graph of computation composed of spout/bolt as the node and stream as the edge Tuple Tuple
  • 6. Physical View SupervisorNimbus Worker * N Worker Executor * N Task * N Supervisor Supervisor ZooKeeper Supervisor Supervisor ZooKeeper ZooKeeper Worker Nimbus: Master daemon process responsible for • distributing code • assigning tasks • monitoring failures ZooKeeper: Storing cluster operational state Supervisor: Worker daemon process listening for work assigned its node Worker: Java process executes a subset of topology Worker node Worker process Executor: Java thread spawned by worker runs on one or more tasks of the same component Task: Component (spout/ bolt) instance performs the actual data processing
  • 7. Spout import backtype.storm.spout.SpoutOutputCollector; import backtype.storm.task.TopologyContext; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseRichSpout; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Values; import backtype.storm.utils.Utils; public class RandomSentenceSpout extends BaseRichSpout { ! SpoutOutputCollector collector; ! Random random; ! ! @Override ! public void open(Map conf, TopologyContext context, ! ! ! SpoutOutputCollector collector) { ! ! this.collector = collector; ! ! random = new Random(); ! } ! @Override ! public void nextTuple() { ! ! String[] sentences = new String[] { ! ! ! ! "the cow jumped over the moon", ! ! ! ! "an apple a day keeps the doctor away", ! ! ! ! "four score and seven years ago", ! ! ! ! "snow white and the seven dwarfs", ! ! ! ! "i am at two with nature" ! ! }; ! ! String sentence = sentences[random.nextInt(sentences.length)]; ! ! collector.emit(new Values(sentence)); ! }
  • 8. Spout ! @Override ! public void open(Map conf, TopologyContext context, ! ! ! SpoutOutputCollector collector) { ! ! this.collector = collector; ! ! random = new Random(); ! } ! @Override ! public void nextTuple() { ! ! String[] sentences = new String[] { ! ! ! ! "the cow jumped over the moon", ! ! ! ! "an apple a day keeps the doctor away", ! ! ! ! "four score and seven years ago", ! ! ! ! "snow white and the seven dwarfs", ! ! ! ! "i am at two with nature" ! ! }; ! ! String sentence = sentences[random.nextInt(sentences.length)]; ! ! collector.emit(new Values(sentence)); ! } ! @Override ! public void declareOutputFields(OutputFieldsDeclarer declarer) { ! ! declarer.declare(new Fields("sentence")); ! } @Override public void ack(Object msgId) {} @Override public void fail(Object msgId) {} }
  • 9. Bolt import backtype.storm.task.OutputCollector; import backtype.storm.task.TopologyContext; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseRichBolt; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Tuple; import backtype.storm.tuple.Values; public class SplitSentenceBolt extends BaseRichBolt { ! OutputCollector collector; ! ! @Override ! public void prepare(Map stormConf, TopologyContext context, ! ! ! OutputCollector collector) { ! ! this.collector = collector; ! } ! @Override ! public void execute(Tuple input) { ! ! for (String s : input.getString(0).split("s")) { ! ! ! collector.emit(new Values(s)); ! ! } ! } ! @Override ! public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); ! } }
  • 10. Topology import backtype.storm.Config; import backtype.storm.LocalCluster; import backtype.storm.StormSubmitter; import backtype.storm.topology.TopologyBuilder; import backtype.storm.tuple.Fields; public class WordCountTopology { ! public static void main(String[] args) throws Exception { ! ! TopologyBuilder builder = new TopologyBuilder(); ! ! builder.setSpout("sentence", new RandomSentenceSpout(), 2); ! ! builder.setBolt("split", new SplitSentenceBolt(), 4) ! ! ! ! .shuffleGrouping("sentence") ! ! ! ! .setNumTasks(8); ! ! builder.setBolt("count", new WordCountBolt(), 6) ! ! ! ! .fieldsGrouping("split", new Fields("word")); ! ! ! ! Config config = new Config(); ! ! config.setNumWorkers(4); ! ! ! ! StormSubmitter.submitTopology("wordcount", config, builder.createTopology()); ! ! ! ! // Local testing //! ! LocalCluster cluster = new LocalCluster(); //! ! cluster.submitTopology("wordcount", config, builder.createTopology()); //! ! Thread.sleep(10000); //! ! cluster.shutdown(); ! } ! }
  • 11. Starting Topology Nimbus Thrift server ZooKeeperStormSubmitter > bin/storm jar Uploads topology JAR to Nimbus’ inbox with dependencies Submits topology configuration as JSON and structure as Thrift Copies topology JAR, configuration and structure into local file system Sets up static information for topology Makes assignment Starts topology
  • 12. Starting Topology ZooKeeper Executor Task Worker Supervisor Nimbus Thrift server Downloads topology JAR, configuration and structure Writes assignment on its node into local file system Starts worker based on the assignment Refreshes connections Makes executors Makes tasks Starts processing
  • 13. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 15. Parallelism RandomSentence Spout SplitSentence Bolt WordCount Bolt Parallelism hint = 2 Parallelism hint = 4 Parallelism hint = 6 Number of tasks = Not specified = Same as parallelism hint = 2 Number of tasks = 8 Number of tasks = Not specified = 6 Number of topology worker = 4 Number of worker slots / node = 4 Number of worker nodes = 2 Number of executor threads = 2 + 4 + 6 = 12 Number of component instances = 2 + 8 + 6 = 16 Worker node Worker node Worker process Worker process SS Bolt WC Bolt RS Spout SS Bolt SS Bolt WC Bolt RS Spout SS Bolt SS Bolt WC Bolt SS Bolt WC Bolt SS Bolt WC Bolt SS Bolt WC Bolt Executor thread Topology can be spread out manually without downtime when a worker node is added
  • 16. Message Passing Worker process Executor Executor Transfer thread Executor Receive thread From other workers To other workers Receiver queue Transfer queue Internal transfer queue Interprocess communication is mediated by ZeroMQ Outside transfer is done with Kryo serialization Local communication is mediated by LMAX Disruptor Inside transfer is done with no serialization
  • 17. LMAX Disruptor • Consumer can easily keep up with producer by batching • CPU cache friendly - The ring is implemented as an array, so the entries can be preloaded • GC safe - The entries are preallocated up front and live forever Large concurrent magic ring buffer can be used like blocking queue Producer Consumer 6 million orders per second can be processed on a single thread at LMAX
  • 18. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 19. Fault-tolerance Cluster works normally ZooKeeper WorkerSupervisorNimbus Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat
  • 20. Fault-tolerance Nimbus goes down ZooKeeper WorkerSupervisorNimbus Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat Monitoring cluster state Processing will still continue. But topology lifecycle operations and reassignment facility are lost
  • 21. Fault-tolerance Worker node goes down ZooKeeper WorkerSupervisorNimbus Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat WorkerSupervisor Nimbus will reassign the tasks to other machines and the processing will continue
  • 22. Fault-tolerance Supervisor goes down ZooKeeper WorkerSupervisorNimbus Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat Processing will still continue. But assignment is never synchronized
  • 23. Fault-tolerance Worker process goes down ZooKeeper WorkerSupervisorNimbus Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat Supervisor will restart the worker process and the processing will continue
  • 24. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 25. Reliability API public class RandomSentenceSpout extends BaseRichSpout { ! public void nextTuple() { ! ! ...; ! ! UUID msgId = getMsgId(); ! ! collector.emit(new Values(sentence), msgId); ! } public void ack(Object msgId) { ! // Do something with acked message id. } public void fail(Object msgId) { ! // Do something with failed message id. } } public class SplitSentenceBolt extends BaseRichBolt { ! public void execute(Tuple input) { ! ! for (String s : input.getString(0).split("s")) { ! ! ! collector.emit(input, new Values(s)); ! ! } ! ! ! ! collector.ack(input); ! } } "the" "the cow jumped over the moon" "cow" "jumped" "over" "the" "moon" Emitting tuple with message id Anchoring incoming tuple to outgoing tuples Sending ack Tuple tree
  • 26. Acking Framework SplitSentence Bolt RandomSentence Spout WordCount Bolt Acker implicit bolt Acker ack Acker fail Acker init Acker implicit bolt Tuple A Tuple C Tuple B 64 bit number called “Ack val”Spout tuple id Spout task id Ack val has become 0, Acker implicit bolt knows the tuple tree has been completed Acker ack Acker fail • Emitted tuple A, XOR tuple A id with ack val • Emitted tuple B, XOR tuple B id with ack val • Emitted tuple C, XOR tuple C id with ack val • Acked tuple A, XOR tuple A id with ack val • Acked tuple B, XOR tuple B id with ack val • Acked tuple C, XOR tuple C id with ack val
  • 27. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 28. Cluster Setup • Setup ZooKeeper cluster • Install dependencies on Nimbus and worker machines - ZeroMQ 2.1.7 and JZMQ - Java 6 and Python 2.6.6 - unzip • Download and extract a Storm release to Nimbus and worker machines • Fill in mandatory configuration into storm.yaml • Launch daemons under supervision using “storm” script
  • 32. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 33. Basic Resources • Storm is available at - http://storm-project.net/ - https://github.com/nathanmarz/storm under Eclipse Public License 1.0 • Get help on - http://groups.google.com/group/storm-user - #storm-user freenode room • Follow - @stormprocessor and @nathanmarz for updates on the project
  • 34. Many Contributions • Community repository for modules to use Storm at - https://github.com/nathanmarz/storm-contrib including integration with Redis, Kafka, MongoDB, HBase, JMS, Amazon SQS and so on • Good articles for understanding Storm internals - http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm- topology/ - http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message- buffers/ • Good slides for understanding real-life examples - http://www.slideshare.net/DanLynn1/storm-as-deep-into-realtime-data-processing-as-you- can-get-in-30-minutes - http://www.slideshare.net/KrishnaGade2/storm-at-twitter
  • 35. Features on Deck • Current release: 0.8.2 as of 6/28/2013 • Work in progress (older): 0.8.3-wip3 - Some bug fixes • Work in progress (newest): 0.9.0-wip19 - SLF4J and Logback - Pluggable tuple serialization and blowfish encryption - Pluggable interprocess messaging and Netty implementation - Some bug fixes - And more
  • 36. Advanced Topics • Distributed RPC • Transactional topologies • Trident • Using non-JVM languages with Storm • Unit testing • Patterns ...Not described in this presentation. So check these out by yourself, or my upcoming session if a chance is given :)