SlideShare uma empresa Scribd logo
1 de 70
Baixar para ler offline
Realtime Processing with
         Storm
Who are we?

              Dario Simonassi
              @ldsimonassi



              Gabriel Eisbruch
              @geisbruch



              Jonathan Leibiusky
              @xetorthio
Agenda

         ● The problem

         ● Storm - Basic concepts

         ● Trident

         ● Deploying

         ● Some other features

         ● Contributions
Tons of
                                                                           information
                                                                           arriving
                                                                           every
                                                                           second




http://necromentia.files.wordpress.com/2011/01/20070103213517_iguazu.jpg
Heavy
                                                   workload




http://www.computerkochen.de/fun/bilder/esel.jpg
Do
                                     Re   it
                                        alti
                                             me
                                                 !




http://www.asimon.co.il/data/images/a10689.jpg
Worker
                                                         Complex
                       Queue
              Worker

                                        Queue


                       Queue   Worker           Worker

Information   Worker
  Source                                Queue              DB

                       Queue   Worker
                                                Worker

              Worker
                                        Queue

                       Queue   Worker
What is Storm?

●   Is a highly distributed realtime computation system.

●   Provides general primitives to do realtime computation.

●   Can be used with any programming language.

●   It's scalable and fault-tolerant.
Example - Main Concepts
https://github.com/geisbruch/strata2012
Main Concepts - Example



         "Count tweets related to strataconf and show hashtag totals."
Main Concepts - Spout

● First thing we need is to read all tweets related to strataconf from twitter

● A Spout is a special component that is a source of messages to be processed.




                                Tweets                    Tuples
                                            Spout
Main Concepts - Spout
      public class ApiStreamingSpout extends BaseRichSpout {
        SpoutOutputCollector collector;
        TwitterReader reader;

          public void nextTuple() {
            Tweet tweet = reader.getNextTweet();
            if(tweet != null)
               collector.emit(new Values(tweet));
          }
          public void open(Map conf, TopologyContext context,
               SpoutOutputCollector collector) {
            reader = new TwitterReader(conf.get("track"), conf.get("user"), conf.get("pass"));
            this.collector = collector;
          }
          public void declareOutputFields(OutputFieldsDeclarer declarer) {
            declarer.declare(new Fields("tweet"));
          }
      }



                                           Tweets                                Tuples
                                                             Spout
Main Concepts - Bolt

● For every tweet, we find hashtags

● A Bolt is a component that does the processing.




                        tweet                                            hashtags
                                                                        hashtags
                        stream               tweet                      hashtags

                                  Spout              HashtagsSplitter
Main Concepts - Bolt
         public class HashtagsSplitter extends BaseBasicBolt {
           public void execute(Tuple input, BasicOutputCollector collector) {
              Tweet tweet = (Tweet)input.getValueByField("tweet");
              for(String hashtag : tweet.getHashTags()){
                collector.emit(new Values(hashtag));
              }
            }
           public void declareOutputFields(OutputFieldsDeclarer declarer) {
              declarer.declare(new Fields("hashtag"));
           }
         }


                    tweet                                                 hashtags
                                              tweet                      hashtags
                                                                         hashtags
                    stream
                                Spout                 HashtagsSplitter
Main Concepts - Bolt
           public class HashtagsCounterBolt extends BaseBasicBolt {
             private Jedis jedis;

                 public void execute(Tuple input, BasicOutputCollector collector) {
                   String hashtag = input.getStringByField("hashtag");
                   jedis.hincrBy("hashs", key, val);
                 }

                 public void prepare(Map conf, TopologyContext context) {
                   jedis = new Jedis("localhost");
                 }
           }

        tweet
        stream                     tweet                      hashtag
                                                              hashtag
                                                               hashtag
                      Spout                HashtagsSplitter              HashtagsCounterBolt
Main Concepts - Topology
     public static Topology createTopology() {
       TopologyBuilder builder = new TopologyBuilder();

         builder.setSpout("tweets-collector", new ApiStreamingSpout(), 1);
         builder.setBolt("hashtags-splitter", new HashTagsSplitter(), 2)
           .shuffleGrouping("tweets-collector");
         builder.setBolt("hashtags-counter", new HashtagsCounterBolt(), 2)
           .fieldsGrouping("hashtags-splitter", new Fields("hashtags"));

         return builder.createTopology();
     }

                                                               fields
                                  shuffle

                                            HashtagsSplitter            HashtagsCounterBolt
                      Spout
                                            HashtagsSplitter            HashtagsCounterBolt
Example Overview - Read Tweet

 Spout                           Splitter   Counter   Redis



         This tweet has #hello
         #world hashtags!!!
Example Overview - Split Tweet

 Spout                           Splitter                     Counter   Redis



         This tweet has #hello
         #world hashtags!!!                 #hello



                                                     #world
Example Overview - Split Tweet

 Spout                           Splitter                     Counter                  Redis



         This tweet has #hello
         #world hashtags!!                  #hello
                                                                        incr(#hello)
                                                                                       #hello = 1
                                                     #world
                                                                        incr(#world)
                                                                                       #world = 1
Example Overview - Split Tweet

 Spout                         Splitter   Counter   Redis




                                                    #hello = 1


         This tweet has #bye
         #world hashtags
                                                    #world = 1
Example Overview - Split Tweet

 Spout                         Splitter                   Counter   Redis




                                                                    #hello = 1
                                          #world
                                                   #bye
         This tweet has #bye
         #world hashtags                                            #world = 1
Example Overview - Split Tweet

 Spout                         Splitter                   Counter                  Redis




                                                                                   #hello = 1
                                          #world
                                                   #bye
         This tweet has #bye                                        incr(#bye)
         #world hashtags                                                           #world = 2
                                                                    incr(#world)    #bye =1
Main Concepts - In summary

 ○ Topology

 ○ Spout

 ○ Stream

 ○ Bolt
Guaranteeing Message Processing




                                  http://www.steinborn.com/home_warranty.php
Guaranteeing Message Processing




                       HashtagsSplitter   HashtagsCounterBolt
             Spout
                       HashtagsSplitter   HashtagsCounterBolt
Guaranteeing Message Processing




                       HashtagsSplitter        HashtagsCounterBolt
             Spout
                       HashtagsSplitter        HashtagsCounterBolt

                      _collector.ack(tuple);
Guaranteeing Message Processing




                       HashtagsSplitter   HashtagsCounterBolt
             Spout
                       HashtagsSplitter   HashtagsCounterBolt
Guaranteeing Message Processing




                       HashtagsSplitter   HashtagsCounterBolt
             Spout
                       HashtagsSplitter   HashtagsCounterBolt
Guaranteeing Message Processing




                                      HashtagsSplitter   HashtagsCounterBolt
               Spout
                                      HashtagsSplitter   HashtagsCounterBolt

           void fail(Object msgId);
Trident




          http://www.raytekmarine.com/images/trident.jpg
Trident - What is Trident?


   ● High-level abstraction on top of Storm.

   ● Will make it easier to build topologies.

   ● Similar to Hadoop, Pig
Trident - Topology



   TridentTopology topology =
                    new TridentTopology();



                                                          hashtags
                                                         hashtags
             tweet                                       hashtags
             stream           tweet

                      Spout           HashtagsSplitter               Memory
Trident - Stream



  Stream tweetsStream = topology
          .newStream("tweets", new ApiStreamingSpout());




                        tweet
                        stream
                                 Spout
Trident
          Stream hashtagsStream =
               tweetsStrem.each(new Fields("tweet"),new BaseFunction() {
                   public void execute(TridentTuple tuple,
                                 TridentCollector collector) {
                       Tweet tweet = (Tweet)tuple.getValueByField("tweet");
                       for(String hashtag : tweet.getHashtags())
                          collector.emit(new Values(hashtag));
                       }
                    }, new Fields("hashtag")
                 );



                       tweet                                           hashtags
                                                                      hashtags
                       stream              tweet                      hashtags

                                Spout              HashtagsSplitter
Trident - States

   TridentState hashtagCounts =
      hashtagsStream.groupBy(new Fields("hashtag"))
       .persistentAggregate(new MemoryMapState.Factory(),
                            new Count(),
                            new Fields("count"));




                                                          hashtags
                                                         hashtags
             tweet                                       hashtags
             stream           tweet
                                                                     Memory
                      Spout           HashtagsSplitter
                                                                     (State)
Trident - States

   TridentState hashtagCounts =
      hashtagsStream.groupBy(new Fields("hashtag"))
       .persistentAggregate(new MemoryMapState.Factory(),
                            new Count(),
                            new Fields("count"));




                                                          hashtags
                                                         hashtags
             tweet                                       hashtags
             stream           tweet
                                                                     Memory
                      Spout           HashtagsSplitter
                                                                     (State)
Trident - States

   TridentState hashtagCounts =
      hashtagsStream.groupBy(new Fields("hashtag"))
       .persistentAggregate(new MemoryMapState.Factory(),
                            new Count(),
                            new Fields("count"));




                                                          hashtags
                                                         hashtags
             tweet                                       hashtags
             stream           tweet
                                                                     Memory
                      Spout           HashtagsSplitter
                                                                     (State)
Trident - States

   TridentState hashtagCounts =
      hashtagsStream.groupBy(new Fields("hashtag"))
       .persistentAggregate(new MemoryMapState.Factory(),
                            new Count(),
                            new Fields("count"));




                                                          hashtags
                                                         hashtags
             tweet                                       hashtags
             stream           tweet
                                                                     Memory
                      Spout           HashtagsSplitter
                                                                     (State)
Trident - Complete example


 TridentState hashtags = topology
        .newStream("tweets", new ApiStreamingSpout())
        .each(new Fields("tweet"),new HashTagSplitter(), new Fields("hashtag"))
        .groupBy(new Fields("hashtag"))
        .persistentAggregate(new MemoryMapState.Factory(),
                                      new Count(),
                                      new Fields("count"))




                                                              hashtags
                                                             hashtags
               tweet                                         hashtags
               stream             tweet
                                                                         Memory
                        Spout             HashtagsSplitter
                                                                         (State)
Transactions
Transactions
  ● Exactly once message processing semantic
Transactions
  ● The intermediate processing may be executed in a parallel way
Transactions
  ● But persistence should be strictly sequential
Transactions
  ● Spouts and persistence should be designed to be transactional
Trident - Queries




     We want to know on-line the count of tweets with hashtag hadoop and the
                    amount of tweets with strataconf hashtag
Trident - DRPC
Trident - Query



 DRPCClient drpc = new DRPCClient("MyDRPCServer",
 "MyDRPCPort");

 drpc.execute("counts", "strataconf hadoop");
Trident - Query




       Stream drpcStream =
              topology.newDRPCStream("counts",drpc)
Trident - Query



  Stream singleHashtag =
       drpcStream.each(new Fields("args"), new Split(), new Fields("hashtag"))




                                     DRPC            Hashtag
                DRPC                 stream          Splitter
                Server
Trident - Query



   Stream hashtagsCountQuery =
        singleHashtag.stateQuery(hashtagCountMemoryState, new Fields("hashtag"),
        new MapGet(), new Fields("count"));




                            DRPC              Hashtag         Hashtag Count
                            stream            Splitter           Query
       DRPC
       Server
Trident - Query



     Stream drpcCountAggregate =
                hashtagsCountQuery.groupBy(new Fields("hashtag"))
                .aggregate(new Fields("count"), new Sum(), new Fields("sum"));




                         DRPC           Hashtag       Hashtag Count   Aggregate
    DRPC                 stream         Splitter         Query          Count
    Server
Trident - Query - Complete Example


    topology.newDRPCStream("counts",drpc)
        .each(new Fields("args"), new Split(), new Fields("hashtag"))
        .stateQuery(hashtags, new Fields("hashtag"),
              new MapGet(), new Fields("count"))
        .each(new Fields("count"), new FilterNull())
        .groupBy(new Fields("hashtag"))
        .aggregate(new Fields("count"), new Sum(), new Fields("sum"));
Trident - Query - Complete Example




                       DRPC      Hashtag    Hashtag Count   Aggregate
         DRPC          stream    Splitter      Query          Count
         Server
Trident - Query - Complete Example



    Client Call

                       DRPC      Hashtag    Hashtag Count   Aggregate
              DRPC     stream    Splitter      Query          Count
              Server
Trident - Query - Complete Example


    Hold the
                        Execute
    Request


                                  DRPC     Hashtag    Hashtag Count   Aggregate
               DRPC               stream   Splitter      Query          Count
               Server
Trident - Query - Complete Example


  Request continue
      holded


                       DRPC           Hashtag        Hashtag Count   Aggregate
              DRPC     stream         Splitter          Query          Count
              Server

                                On finish return to DRPC
                                         server
Trident - Query - Complete Example




                          DRPC     Hashtag    Hashtag Count   Aggregate
              DRPC        stream   Splitter      Query          Count
              Server

 Return response to the
          client
Running Topologies
Running Topologies - Local Cluster
    Config conf = new Config();
    conf.put("some_property","some_value")
    LocalCluster cluster = new LocalCluster();
    cluster.submitTopology("twitter-hashtag-summarizer", conf, builder.createTopology());



    ●   The whole topology will run in a single machine.

    ●   Similar to run in a production cluster

    ●   Very Useful for test and development

    ●   Configurable parallelism

    ●   Similar methods than StormSubmitter
Running Topologies - Production

   package twitter.streaming;
   public class Topology {
     public static void main(String[] args){
       StormSubmitter.submitTopology("twitter-hashtag-summarizer",
          conf, builder.createTopology());
     }
   }


    mvn package

    storm jar Strata2012-0.0.1.jar twitter.streaming.Topology 
         "$REDIS_HOST" $REDIS_PORT "strata,strataconf"
       $TWITTER_USER $TWITTER_PASS
Running Topologies - Cluster Layout

● Single Master process                Storm Cluster
                                                                                 Supervisor
● No SPOF                                                                  (where the processing is done)


● Automatic Task distribution              Nimbus                                Supervisor
                                           (The master
                                                                           (where the processing is done)
                                             Process)
● Amount of task configurable by
  process                                                                        Supervisor
                                                                           (where the processing is done)

● Automatic tasks re-distribution on
  supervisors fails
                                                            Zookeeper Cluster
                                                       (Only for coordination and cluster state)
● Work continue on nimbus fails
Running Topologies - Cluster Architecture - No SPOF

          Storm Cluster


                                            Supervisor
                                      (where the processing is done)




                   Nimbus                   Supervisor
               (The master Process)   (where the processing is done)




                                            Supervisor
                                      (where the processing is done)
Running Topologies - Cluster Architecture - No SPOF

          Storm Cluster


                                            Supervisor
                                      (where the processing is done)




                   Nimbus                   Supervisor
               (The master Process)   (where the processing is done)




                                            Supervisor
                                      (where the processing is done)
Running Topologies - Cluster Architecture - No SPOF

          Storm Cluster


                                            Supervisor
                                      (where the processing is done)




                   Nimbus                   Supervisor
               (The master Process)   (where the processing is done)




                                            Supervisor
                                      (where the processing is done)
Other features
Other features - Non JVM languages
                                             {
    ● Use Non-JVM Languages                      "conf": {
                                                    "topology.message.timeout.secs": 3,
                                                    // etc
    ● Simple protocol                            },
                                                 "context": {
    ● Use stdin & stdout for communication          "task->component": {
                                                       "1": "example-spout",
                                                       "2": "__acker",
    ● Many implementations and abstraction
                                                       "3": "example-bolt"
      layers done by the community                  },
                                                    "taskid": 3
                                                 },
                                                 "pidDir": "..."
                                             }
Contribution
●   A collection of community developed spouts, bolts, serializers, and other goodies.
●   You can find it in: https://github.com/nathanmarz/storm-contrib

●   Some of those goodies are:

    ○   cassandra: Integrates Storm and Cassandra by writing tuples to a Cassandra Column Family.

    ○   mongodb: Provides a spout to read documents from mongo, and a bolt to write documents to mongo.

    ○   SQS: Provides a spout to consume messages from Amazon SQS.

    ○   hbase: Provides a spout and a bolt to read and write from HBase.

    ○   kafka: Provides a spout to read messages from kafka.

    ○   rdbms: Provides a bolt to write tuples to a table.

    ○   redis-pubsub: Provides a spout to read messages from redis pubsub.

    ○   And more...
Questions?

Mais conteúdo relacionado

Mais procurados

Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormAndrea Iacono
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with StormMariusz Gil
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleDung Ngua
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataDataWorks Summit
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormLester Martin
 

Mais procurados (20)

Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Storm Anatomy
Storm AnatomyStorm Anatomy
Storm Anatomy
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Storm
StormStorm
Storm
 
Tutorial Kafka-Storm
Tutorial Kafka-StormTutorial Kafka-Storm
Tutorial Kafka-Storm
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 

Semelhante a Realtime processing with storm presentation

Storm - The Real-Time Layer Your Big Data's Been Missing
Storm - The Real-Time Layer Your Big Data's Been MissingStorm - The Real-Time Layer Your Big Data's Been Missing
Storm - The Real-Time Layer Your Big Data's Been MissingFullContact
 
掀起 Swift 的面紗
掀起 Swift 的面紗掀起 Swift 的面紗
掀起 Swift 的面紗Pofat Tseng
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...source{d}
 
The Open Source... Behind the Tweets
The Open Source... Behind the TweetsThe Open Source... Behind the Tweets
The Open Source... Behind the TweetsChris Aniszczyk
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightMatthew Russell
 
Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insightDigital Reasoning
 
FOSDEM2018 Janus Lua plugin presentation
FOSDEM2018 Janus Lua plugin presentationFOSDEM2018 Janus Lua plugin presentation
FOSDEM2018 Janus Lua plugin presentationLorenzo Miniero
 
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기Heejong Ahn
 
Git, from the beginning
Git, from the beginningGit, from the beginning
Git, from the beginningJames Aylett
 
He stopped using for/while loops, you won't believe what happened next!
He stopped using for/while loops, you won't believe what happened next!He stopped using for/while loops, you won't believe what happened next!
He stopped using for/while loops, you won't believe what happened next!François-Guillaume Ribreau
 
The journey to GitOps
The journey to GitOpsThe journey to GitOps
The journey to GitOpsNicola Baldi
 
KI University - Git internals
KI University - Git internalsKI University - Git internals
KI University - Git internalsmfkaptan
 
State of GeoTools 2012
State of GeoTools 2012State of GeoTools 2012
State of GeoTools 2012Jody Garnett
 
Adopting F# at SBTech
Adopting F# at SBTechAdopting F# at SBTech
Adopting F# at SBTechAntya Dev
 
OpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonOpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonCodeOps Technologies LLP
 

Semelhante a Realtime processing with storm presentation (20)

Storm - The Real-Time Layer Your Big Data's Been Missing
Storm - The Real-Time Layer Your Big Data's Been MissingStorm - The Real-Time Layer Your Big Data's Been Missing
Storm - The Real-Time Layer Your Big Data's Been Missing
 
掀起 Swift 的面紗
掀起 Swift 的面紗掀起 Swift 的面紗
掀起 Swift 的面紗
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
 
The Open Source... Behind the Tweets
The Open Source... Behind the TweetsThe Open Source... Behind the Tweets
The Open Source... Behind the Tweets
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and Insight
 
Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insight
 
Twitter Big Data
Twitter Big DataTwitter Big Data
Twitter Big Data
 
FOSDEM2018 Janus Lua plugin presentation
FOSDEM2018 Janus Lua plugin presentationFOSDEM2018 Janus Lua plugin presentation
FOSDEM2018 Janus Lua plugin presentation
 
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
TypeScript와 Flow: 
자바스크립트 개발에 정적 타이핑 도입하기
 
Git_new.pptx
Git_new.pptxGit_new.pptx
Git_new.pptx
 
Git, from the beginning
Git, from the beginningGit, from the beginning
Git, from the beginning
 
Twitter Stream Processing
Twitter Stream ProcessingTwitter Stream Processing
Twitter Stream Processing
 
Git 101 for Beginners
Git 101 for Beginners Git 101 for Beginners
Git 101 for Beginners
 
He stopped using for/while loops, you won't believe what happened next!
He stopped using for/while loops, you won't believe what happened next!He stopped using for/while loops, you won't believe what happened next!
He stopped using for/while loops, you won't believe what happened next!
 
Introduction to Git and GitHub
Introduction to Git and GitHubIntroduction to Git and GitHub
Introduction to Git and GitHub
 
The journey to GitOps
The journey to GitOpsThe journey to GitOps
The journey to GitOps
 
KI University - Git internals
KI University - Git internalsKI University - Git internals
KI University - Git internals
 
State of GeoTools 2012
State of GeoTools 2012State of GeoTools 2012
State of GeoTools 2012
 
Adopting F# at SBTech
Adopting F# at SBTechAdopting F# at SBTech
Adopting F# at SBTech
 
OpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in PythonOpenWhisk by Example - Auto Retweeting Example in Python
OpenWhisk by Example - Auto Retweeting Example in Python
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Realtime processing with storm presentation

  • 2. Who are we? Dario Simonassi @ldsimonassi Gabriel Eisbruch @geisbruch Jonathan Leibiusky @xetorthio
  • 3. Agenda ● The problem ● Storm - Basic concepts ● Trident ● Deploying ● Some other features ● Contributions
  • 4. Tons of information arriving every second http://necromentia.files.wordpress.com/2011/01/20070103213517_iguazu.jpg
  • 5. Heavy workload http://www.computerkochen.de/fun/bilder/esel.jpg
  • 6. Do Re it alti me ! http://www.asimon.co.il/data/images/a10689.jpg
  • 7.
  • 8.
  • 9. Worker Complex Queue Worker Queue Queue Worker Worker Information Worker Source Queue DB Queue Worker Worker Worker Queue Queue Worker
  • 10. What is Storm? ● Is a highly distributed realtime computation system. ● Provides general primitives to do realtime computation. ● Can be used with any programming language. ● It's scalable and fault-tolerant.
  • 11. Example - Main Concepts https://github.com/geisbruch/strata2012
  • 12. Main Concepts - Example "Count tweets related to strataconf and show hashtag totals."
  • 13. Main Concepts - Spout ● First thing we need is to read all tweets related to strataconf from twitter ● A Spout is a special component that is a source of messages to be processed. Tweets Tuples Spout
  • 14. Main Concepts - Spout public class ApiStreamingSpout extends BaseRichSpout { SpoutOutputCollector collector; TwitterReader reader; public void nextTuple() { Tweet tweet = reader.getNextTweet(); if(tweet != null) collector.emit(new Values(tweet)); } public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { reader = new TwitterReader(conf.get("track"), conf.get("user"), conf.get("pass")); this.collector = collector; } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("tweet")); } } Tweets Tuples Spout
  • 15. Main Concepts - Bolt ● For every tweet, we find hashtags ● A Bolt is a component that does the processing. tweet hashtags hashtags stream tweet hashtags Spout HashtagsSplitter
  • 16. Main Concepts - Bolt public class HashtagsSplitter extends BaseBasicBolt { public void execute(Tuple input, BasicOutputCollector collector) { Tweet tweet = (Tweet)input.getValueByField("tweet"); for(String hashtag : tweet.getHashTags()){ collector.emit(new Values(hashtag)); } } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("hashtag")); } } tweet hashtags tweet hashtags hashtags stream Spout HashtagsSplitter
  • 17. Main Concepts - Bolt public class HashtagsCounterBolt extends BaseBasicBolt { private Jedis jedis; public void execute(Tuple input, BasicOutputCollector collector) { String hashtag = input.getStringByField("hashtag"); jedis.hincrBy("hashs", key, val); } public void prepare(Map conf, TopologyContext context) { jedis = new Jedis("localhost"); } } tweet stream tweet hashtag hashtag hashtag Spout HashtagsSplitter HashtagsCounterBolt
  • 18. Main Concepts - Topology public static Topology createTopology() { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("tweets-collector", new ApiStreamingSpout(), 1); builder.setBolt("hashtags-splitter", new HashTagsSplitter(), 2) .shuffleGrouping("tweets-collector"); builder.setBolt("hashtags-counter", new HashtagsCounterBolt(), 2) .fieldsGrouping("hashtags-splitter", new Fields("hashtags")); return builder.createTopology(); } fields shuffle HashtagsSplitter HashtagsCounterBolt Spout HashtagsSplitter HashtagsCounterBolt
  • 19. Example Overview - Read Tweet Spout Splitter Counter Redis This tweet has #hello #world hashtags!!!
  • 20. Example Overview - Split Tweet Spout Splitter Counter Redis This tweet has #hello #world hashtags!!! #hello #world
  • 21. Example Overview - Split Tweet Spout Splitter Counter Redis This tweet has #hello #world hashtags!! #hello incr(#hello) #hello = 1 #world incr(#world) #world = 1
  • 22. Example Overview - Split Tweet Spout Splitter Counter Redis #hello = 1 This tweet has #bye #world hashtags #world = 1
  • 23. Example Overview - Split Tweet Spout Splitter Counter Redis #hello = 1 #world #bye This tweet has #bye #world hashtags #world = 1
  • 24. Example Overview - Split Tweet Spout Splitter Counter Redis #hello = 1 #world #bye This tweet has #bye incr(#bye) #world hashtags #world = 2 incr(#world) #bye =1
  • 25. Main Concepts - In summary ○ Topology ○ Spout ○ Stream ○ Bolt
  • 26. Guaranteeing Message Processing http://www.steinborn.com/home_warranty.php
  • 27. Guaranteeing Message Processing HashtagsSplitter HashtagsCounterBolt Spout HashtagsSplitter HashtagsCounterBolt
  • 28. Guaranteeing Message Processing HashtagsSplitter HashtagsCounterBolt Spout HashtagsSplitter HashtagsCounterBolt _collector.ack(tuple);
  • 29. Guaranteeing Message Processing HashtagsSplitter HashtagsCounterBolt Spout HashtagsSplitter HashtagsCounterBolt
  • 30. Guaranteeing Message Processing HashtagsSplitter HashtagsCounterBolt Spout HashtagsSplitter HashtagsCounterBolt
  • 31. Guaranteeing Message Processing HashtagsSplitter HashtagsCounterBolt Spout HashtagsSplitter HashtagsCounterBolt void fail(Object msgId);
  • 32. Trident http://www.raytekmarine.com/images/trident.jpg
  • 33. Trident - What is Trident? ● High-level abstraction on top of Storm. ● Will make it easier to build topologies. ● Similar to Hadoop, Pig
  • 34. Trident - Topology TridentTopology topology = new TridentTopology(); hashtags hashtags tweet hashtags stream tweet Spout HashtagsSplitter Memory
  • 35. Trident - Stream Stream tweetsStream = topology .newStream("tweets", new ApiStreamingSpout()); tweet stream Spout
  • 36. Trident Stream hashtagsStream = tweetsStrem.each(new Fields("tweet"),new BaseFunction() { public void execute(TridentTuple tuple, TridentCollector collector) { Tweet tweet = (Tweet)tuple.getValueByField("tweet"); for(String hashtag : tweet.getHashtags()) collector.emit(new Values(hashtag)); } }, new Fields("hashtag") ); tweet hashtags hashtags stream tweet hashtags Spout HashtagsSplitter
  • 37. Trident - States TridentState hashtagCounts = hashtagsStream.groupBy(new Fields("hashtag")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")); hashtags hashtags tweet hashtags stream tweet Memory Spout HashtagsSplitter (State)
  • 38. Trident - States TridentState hashtagCounts = hashtagsStream.groupBy(new Fields("hashtag")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")); hashtags hashtags tweet hashtags stream tweet Memory Spout HashtagsSplitter (State)
  • 39. Trident - States TridentState hashtagCounts = hashtagsStream.groupBy(new Fields("hashtag")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")); hashtags hashtags tweet hashtags stream tweet Memory Spout HashtagsSplitter (State)
  • 40. Trident - States TridentState hashtagCounts = hashtagsStream.groupBy(new Fields("hashtag")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")); hashtags hashtags tweet hashtags stream tweet Memory Spout HashtagsSplitter (State)
  • 41. Trident - Complete example TridentState hashtags = topology .newStream("tweets", new ApiStreamingSpout()) .each(new Fields("tweet"),new HashTagSplitter(), new Fields("hashtag")) .groupBy(new Fields("hashtag")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")) hashtags hashtags tweet hashtags stream tweet Memory Spout HashtagsSplitter (State)
  • 43. Transactions ● Exactly once message processing semantic
  • 44. Transactions ● The intermediate processing may be executed in a parallel way
  • 45. Transactions ● But persistence should be strictly sequential
  • 46. Transactions ● Spouts and persistence should be designed to be transactional
  • 47. Trident - Queries We want to know on-line the count of tweets with hashtag hadoop and the amount of tweets with strataconf hashtag
  • 49. Trident - Query DRPCClient drpc = new DRPCClient("MyDRPCServer", "MyDRPCPort"); drpc.execute("counts", "strataconf hadoop");
  • 50. Trident - Query Stream drpcStream = topology.newDRPCStream("counts",drpc)
  • 51. Trident - Query Stream singleHashtag = drpcStream.each(new Fields("args"), new Split(), new Fields("hashtag")) DRPC Hashtag DRPC stream Splitter Server
  • 52. Trident - Query Stream hashtagsCountQuery = singleHashtag.stateQuery(hashtagCountMemoryState, new Fields("hashtag"), new MapGet(), new Fields("count")); DRPC Hashtag Hashtag Count stream Splitter Query DRPC Server
  • 53. Trident - Query Stream drpcCountAggregate = hashtagsCountQuery.groupBy(new Fields("hashtag")) .aggregate(new Fields("count"), new Sum(), new Fields("sum")); DRPC Hashtag Hashtag Count Aggregate DRPC stream Splitter Query Count Server
  • 54. Trident - Query - Complete Example topology.newDRPCStream("counts",drpc) .each(new Fields("args"), new Split(), new Fields("hashtag")) .stateQuery(hashtags, new Fields("hashtag"), new MapGet(), new Fields("count")) .each(new Fields("count"), new FilterNull()) .groupBy(new Fields("hashtag")) .aggregate(new Fields("count"), new Sum(), new Fields("sum"));
  • 55. Trident - Query - Complete Example DRPC Hashtag Hashtag Count Aggregate DRPC stream Splitter Query Count Server
  • 56. Trident - Query - Complete Example Client Call DRPC Hashtag Hashtag Count Aggregate DRPC stream Splitter Query Count Server
  • 57. Trident - Query - Complete Example Hold the Execute Request DRPC Hashtag Hashtag Count Aggregate DRPC stream Splitter Query Count Server
  • 58. Trident - Query - Complete Example Request continue holded DRPC Hashtag Hashtag Count Aggregate DRPC stream Splitter Query Count Server On finish return to DRPC server
  • 59. Trident - Query - Complete Example DRPC Hashtag Hashtag Count Aggregate DRPC stream Splitter Query Count Server Return response to the client
  • 61. Running Topologies - Local Cluster Config conf = new Config(); conf.put("some_property","some_value") LocalCluster cluster = new LocalCluster(); cluster.submitTopology("twitter-hashtag-summarizer", conf, builder.createTopology()); ● The whole topology will run in a single machine. ● Similar to run in a production cluster ● Very Useful for test and development ● Configurable parallelism ● Similar methods than StormSubmitter
  • 62. Running Topologies - Production package twitter.streaming; public class Topology { public static void main(String[] args){ StormSubmitter.submitTopology("twitter-hashtag-summarizer", conf, builder.createTopology()); } } mvn package storm jar Strata2012-0.0.1.jar twitter.streaming.Topology "$REDIS_HOST" $REDIS_PORT "strata,strataconf" $TWITTER_USER $TWITTER_PASS
  • 63. Running Topologies - Cluster Layout ● Single Master process Storm Cluster Supervisor ● No SPOF (where the processing is done) ● Automatic Task distribution Nimbus Supervisor (The master (where the processing is done) Process) ● Amount of task configurable by process Supervisor (where the processing is done) ● Automatic tasks re-distribution on supervisors fails Zookeeper Cluster (Only for coordination and cluster state) ● Work continue on nimbus fails
  • 64. Running Topologies - Cluster Architecture - No SPOF Storm Cluster Supervisor (where the processing is done) Nimbus Supervisor (The master Process) (where the processing is done) Supervisor (where the processing is done)
  • 65. Running Topologies - Cluster Architecture - No SPOF Storm Cluster Supervisor (where the processing is done) Nimbus Supervisor (The master Process) (where the processing is done) Supervisor (where the processing is done)
  • 66. Running Topologies - Cluster Architecture - No SPOF Storm Cluster Supervisor (where the processing is done) Nimbus Supervisor (The master Process) (where the processing is done) Supervisor (where the processing is done)
  • 68. Other features - Non JVM languages { ● Use Non-JVM Languages "conf": { "topology.message.timeout.secs": 3, // etc ● Simple protocol }, "context": { ● Use stdin & stdout for communication "task->component": { "1": "example-spout", "2": "__acker", ● Many implementations and abstraction "3": "example-bolt" layers done by the community }, "taskid": 3 }, "pidDir": "..." }
  • 69. Contribution ● A collection of community developed spouts, bolts, serializers, and other goodies. ● You can find it in: https://github.com/nathanmarz/storm-contrib ● Some of those goodies are: ○ cassandra: Integrates Storm and Cassandra by writing tuples to a Cassandra Column Family. ○ mongodb: Provides a spout to read documents from mongo, and a bolt to write documents to mongo. ○ SQS: Provides a spout to consume messages from Amazon SQS. ○ hbase: Provides a spout and a bolt to read and write from HBase. ○ kafka: Provides a spout to read messages from kafka. ○ rdbms: Provides a bolt to write tuples to a table. ○ redis-pubsub: Provides a spout to read messages from redis pubsub. ○ And more...