O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Introduction to Streaming Distributed Processing with Storm

1.285 visualizações

Publicada em

Contact:
https://www.linkedin.com/in/brandonjobrien
@hakczar

Introducing streaming data concepts, Storm cluster architecture, Storm topology architecture, and demonstrate working example of a WordCount topology for SIGKDD Seattle chapter meetup.

Presented by Brandon O'Brien
Code example: https://github.com/OpenDataMining/brandonobrien
Meetup: http://www.meetup.com/seattlesigkdd/events/222955114/

Publicada em: Dados e análise
  • There are over 16,000 woodworking plans that comes with step-by-step instructions and detailed photos, Click here to take a look ✔✔✔ https://url.cn/xFeBN0O4
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Seja a primeira pessoa a gostar disto

Introduction to Streaming Distributed Processing with Storm

  1. 1. Introduction to Streaming Distributed Processing with Storm Presenter: Brandon O’Brien Data Engineer @ Expedia
  2. 2. Outline  Distributed Systems & Batch Processing  Streaming Processing. Introduce Storm  WordCount Demo & Setup  Storm Cluster Architecture  Storm Topology Architecture  WordCount Deep Dive  Discussion and Q&A: Storm Use Cases & Patterns
  3. 3. Distributed Systems  Distribute work across N nodes  Hadoop Ecosystem  Batch processing  Massively parallel (horizontal scale out)  Problems – data latency, 24 hour batching vs global client base  What’s next? Increasing need to move to real time & streaming processing models
  4. 4. Streaming Processing  Provides near real time views into analytical data sets and system status. Allows for real time intervention & response to events  Streaming frameworks: Spark, Azure Streaming Analytics, AWS Kinesis+Lambda, Storm  Created by Nathan Marz, first used at Twitter  Storm: “Doing for realtime processing what Hadoop did for batch processing”  Stream definition: “unbounded sequence of tuples”
  5. 5. Storm WordCount Demo  WordCount Storm Topology Streams text blobs Counts word occurrences Reporting results each 10 seconds  Getting it running https://github.com/OpenDataMining/brandonobrien mvn clean install exec:java -Dexec.mainClass= "dataclub.storm.TokenCountingTopology”
  6. 6. Storm Cluster Architecture  Core components:  Zookeeper  Nimbus  Supervisors  Workers/JVM  Executor/thread  Component/task (bolts & spouts)  Scalability – can add supervisors while topologies are running, no code change required  Supervisors run Worker JVMs  Workers run Executor Threads  Executors run Tasks (instances of Spouts and Bolts)
  7. 7. Storm Topology Architecture  DAG Processing Model  Directed Acyclic Graph  Components: Spout & Bolt (benefit: decouple logic from scalability)  Tasks (instances of Spouts & Bolts)  Executors (run Tasks)
  8. 8. Storm WordCount Deep Dive  Topology structure  Classes  Spout: SentenceProducer.java  Bolt: SentenceTokenizer.java  Bolt: TokenCounter.java  Putting it all together: TokenCountingTopology.java
  9. 9. Storm Use Cases & Patterns  Consume data from Kafka, Kinesis or other queue  Persist data to high write perf datastore like Cassandra  Streaming map reduce, multi-stage map reduce  Storm is stateless & fail-fast. Externalize state using Redis or other cache for resiliency  Online learning / realtime model updates (using frameworks like WEKA or others)  Real world use cases: Real time ad targeting, travel market analytics, user behavior analytics, system monitoring & SLA  Storm multi lang API (Python, Ruby, PERL, JavaScript, Scala, and more)
  10. 10. Distributed Streaming Processing with Storm  Going Further https://storm.apache.org/ http://storm.apache.org/documentation/Common-patterns.html Frameworks: Trident, Summingbird Stand up Storm cluster: http://www.michael- noll.com/tutorials/running-multi-node-storm-cluster/  Contact Brandon O’Brien, Data Engineer @ Expedia https://www.linkedin.com/in/brandonjobrien  Q&A

×