1. Firing up Strom
Md. Shamsur Rahim
Student of MScCS
American International University
Bangladesh
2. Required Software Package
• Java SDK 6
▫ http://www.oracle.com/technetwork/java/javase/dow
nloads/index.html
• Maven
▫ software dependency management tool and is used to
manage the project's build, reporting, and
documentation.
▫ http://maven.apache.org/download.cgi
• Spring Tool Suite: IDE
▫ is an integrated development environment and is used
to develop applications.
▫ https://spring.io/tools/sts/all
3. Configuring Spring Tool Suite: IDE
• Go to Windows| Preferences|
Maven| Installations and add
the path of maven-3.0.4, as
shown in the following
screenshot:
• Go to Window| Preferences|
Java| Installed JREs and add the
path of Java Runtime
Environment 6(JRE 6), as shown
in the following screenshot:
5. Steps to create and execute a sample topology
1. Start your STS IDE and create a Maven project.
2. Select your Workspace location and click “Next”.
3. Now select “org.apache.maven.archtypes” [A Maven Java plugin
dev. project] and click Next.
4. Specify com.learningstorm as Group Id and storm-example as
Artifact Id, as shown in the following screenshot:
6. Storm components
A Storm cluster
follows a master-
slave model where
the master and
slave processes are
coordinated
through
ZooKeeper.
7. Storm components: Nimbus
• Nimbus:
▫ is the master in a Storm cluster.
▫ It is responsible for :
distributing application code across various worker
nodes
assigning tasks to different machines
monitoring tasks for any failures
restarting them as and when required
▫ It is stateless.
▫ Stores all of its data in ZooKeeper.
▫ Only one Nimbus node in a Storm Cluster.
▫ It can be restarted without having any effects on the
already running tasks on the worker nodes.
8. Storm components: Supervisor nodes
• Supervisor nodes
▫ are the worker nodes in a Storm cluster.
▫ Each supervisor node runs a supervisor daemon that is
responsible for-
creating,
Starting and
Stopping
worker processes to execute the tasks assigned to that
node.
▫ Like Nimbus, a supervisor daemon is also fail-fast and
stores all of its state in ZooKeeper so that it can be
restarted without any state loss.
▫ A single supervisor daemon normally handles multiple
worker processes .
9. Storm components: The ZooKeeper cluster
• ZooKeeper is an application that:
▫ Coordinates,
▫ Shares some configuration information
Among various processes.
▫ Being a distributed application, Storm also uses a
ZooKeeper cluster to coordinate various processes.
▫ All of the states associated with the cluster and the various
tasks submitted to the Storm are stored in ZooKeeper.
▫ Nimbus and Supervisor Nodes communicate only through
ZooKeeper.
▫ As all data are stored in ZooKeeper, So any time we can kill
Nimbus or supervisor daemons without affecting cluster.
10. The Storm data model
• Basic unit of data that can be processed by a Storm
application is called a tuple.
• Each tuple consists of a predefined list of fields.
• The value of each field can be a byte, char, integer,
long, float, double, Boolean, or byte array.
• Storm also provides an API to define your own data
types, which can be serialized as fields in a tuple.
• A tuple is dynamically typed, that is, you just need to
define the names of the fields in a tuple and not
their data type.
• Fields in a tuple can be accessed by its name
getValueByField(String) or its positional
index getValue(int).
11. Definition of a Storm topology
• A topology is an abstraction that
defines the graph of the
computation.
• We create a Storm topology and
deploy it on a Storm cluster to
process the data.
• A topology can be represented by a
direct acyclic graph.
12. The Components of a Storm topology:
Stream
• Stream:
▫ The key abstraction in Storm is that of a stream.
▫ It is an unbounded sequence of tuples that can be
processed in parallel by Storm.
▫ Each stream can be processed by a single or
multiple types of bolts.
▫ Each stream in a Storm application is given an ID.
▫ The bolts can produce and consume tuples from
these streams on the basis of their ID.
▫ Each stream also has an associated schema for the
tuples that will flow through it.(????)
13. The Components of a Storm topology:
Spout
• Spout:
▫ A spout is the source of tuples in a Storm topology.
▫ It is responsible for reading or listening to data
from an external source and publishing them—
emitting (into stream).
▫ A spout can emit multiple streams.
▫ Some important methods of spout are:
nextTuple() ack(Object msgId)
open() fail(Object msgId)
14. The Components of a Storm topology:
Bolt• Bolt:
▫ A bolt is the processing powerhouse of a Storm topology.
▫ It is responsible for transforming a stream.
▫ each bolt in the topology should be doing a simple
transformation of the tuples.
▫ many such bolts can coordinate with each other to exhibit a
complex transformation.
▫ Some important methods of Bolt are:
execute(Tuple input):
This method is executed for each tuple that comes through
the subscribed input streams.
prepare(Map stormConf, TopologyContext
context, OutputCollector collector):
In this method, you should make sure the bolt is properly
configured to execute tuples now.
15. Operation modes
Operation modes indicate how the topology is
deployed in Storm.
• The local mode:
▫ Storm topologies run on the local machine in a
single JVM.
• The remote mode:
▫ we will use the Storm client to submit the topology
to the master along with all the necessary code
required to execute the topology.