SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
Apache Storm Concepts
André Dias
andre.dias@summa.com.br
Highlights
● Background
● Storm’s History
● Concepts
● Integrations
● Real Cases
● Q&A
Background
Background
Big Data V’s
Background
Big Data V’s
Volume
Petabytes / Terabytes
Velocity
Real-time / Near Real-time
Variety
Sensors, Blog Posts, Logs, Social Networks...
Background
Data Streaming
Background
Data Streaming
Background
Data Streaming
Now Please… I’
m in traffic…
● Discovery
● Ingest
● Process
● Persist
● Analyze
● Expose
Background
Data Value Chain
● Discovery
● Ingest
● Process
● Persist
● Analyze
● Expose
Background
Data Value Chain (Storm Focus)
Background
Data Value Chain
● Discovery
● Ingest
● Process
● Persist
● Analyze
● Expose
Background
Data Value Chain (Ingest)
Stream Processing
Data in Motion
Batch Processing
● Data processing architecture
○ Generic
○ Scalable
○ Fault-tolerant (human/hardware)
● Low latency
Background
Lambda Architecture (LA)
Background
Lambda Architecture (LA)
Storm’s History
Nathan Marz
● Backtype (2008), acquired by Twitter (2011)
● Lambda Architecture Creator
● September, 2011. Storm Creation
● September, 2013. Storm entered to ASF
Why Use It?
● Scalable, as Hadoop
● No data loss, reliable
● Fault tolerant
● Language agnostic
● Real-time, real fast
Why Use It?
Storm is TLP !!
Concepts
Concepts
● Tuple
● Streams
● Spouts
● Bolts
● Topologies / Trident API
● Stream Groupings
Concepts
Spouts
● Source of Streams
● Data Consumers (Ingestion)
● Emits Tuples
Concepts
Bolts
● Units of Work to tuples
● Data streaming logic
● Can emit tuples as well
● Data store integration
Concepts
Topology
● Data Streaming Flow Representation
● DAG (Direct Acyclic Graph) of Spouts and
Bolts
● Streaming computation
● Each node as individual task (parallel
execution)
● Stateless
Concepts
Trident API
● Abstraction Layer over low-level Storm API
● More Complex Topologies
● Stateful
● Micro-batch
● High-level API (similar to Pig / Cascading -
Hadoop)
● Message processed at least once
(guaranteed)
Concepts
Trident API - Micro Batch
● Trident Batches
○ are Ordered
Concepts
Trident API - Micro Batch
● Trident Batches
○ can be Partitioned
Concepts
Streaming Groups
● Data Flow Control over
Topologies
Architecture
Architecture
Components - Nimbus
● Master node (similar to JobTracker)
● Monitor and distribute the processing
workload across worker nodes
● Stores all its data into Zookeeper
Architecture
Components - Supervisor
● Worker node (similar to TaskTracker)
● Monitor and distribute the processing
workload across worker nodes
● Stores all its data into Zookeeper
Architecture
Overview
● Master-slave approach
● Cluster coordination
(Zookeeper)
● Nimbus HA
Integrations
Real Cases
Collector sensor
information to a
Data Lake
Micro-batch
user contents,
content feeds
and application
logs
Real-time user
music
recommendations
Q&A
THANK YOU!
Use Storm

Mais conteúdo relacionado

Mais procurados

Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
DataWorks Summit
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
DataWorks Summit
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
Uday Vakalapudi
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
P. Taylor Goetz
 

Mais procurados (20)

Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Storm
StormStorm
Storm
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Realtime processing with storm presentation
Realtime processing with storm presentationRealtime processing with storm presentation
Realtime processing with storm presentation
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & Storm
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 

Semelhante a Apache Storm Concepts

Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 

Semelhante a Apache Storm Concepts (20)

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016
 
4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...
4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...
4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using A...
 
Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computing
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
 
Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Initial presentation of openstack (for montreal user group)
Initial presentation of openstack (for montreal user group)Initial presentation of openstack (for montreal user group)
Initial presentation of openstack (for montreal user group)
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 

Apache Storm Concepts