SlideShare uma empresa Scribd logo
1 de 56
Baixar para ler offline
1
GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore 2015
Sean Zhong
Xiang.zhong@intel.com, Intel Software
2
What is Gearpump
• Akka based lightweight Real time data processing platform.
• Apache License http://gearpump.io version 0.7
• Akka:
• Communication,
concurrency, Isolation,
and fault-tolerant
Simple and Powerful
Message level streaming
Long running daemons
What is Akka?
3
What is Akka?
• Micro-service(Actor) oriented.
• Message Driven
• Lock-free
• Location-transparent
It is like our human society, driven by message
Which can scale to 7 billion population!
4
Micro-service oriented
higher abstraction
• Break your application into Micro services
instead of object.
• Throw away locks
• Use Immutable Async message to exchange
information between micro-service instead of
shared object.
5
Gearpump in Big Data Stack
store
visualization
batch stream
SQL
Catalyst
StreamSQL
Impala
Cluster manager monitor/alert/notify
Machine learning
Graphx
Cloudera
Manager
Storage
Engine
Analytics
Visualization
& management
storm
Data explore
Gearpump
6
Why another streaming platform?
• The requirements are not fully met.
• A higher abstraction like micro-service oriented can largely simplify
the problem.
7
What we want
• Meet The 8 Requirements of Real-Time Stream
Processing (2006)
7
Flexible Volume Speed Accuracy Visual
Any where
Any size
Any source
Any use case
dynamic DAG
②StreamSQL
High
throughput
⑦Scale linearly
①In-Stream
Zero latency
⑥HA
⑧Responsive
Exactly-once
③Message
loss/delay/out
of order
④Predictable
Easy to debug
WYSWYG
8
9
DAG representation and API
Graph(A~>B~>C~>D, B~>E~>D)
Low level Graph API Syntax:
A B C D
E
Processor
Processor Processor Processor
Processor
DAG
Shuffle
Field
grouping
Field
grouping
10
Architecture - Actor Hierarchy
Client
Hook in and query state
As general service
YARN
Each App has one isolated AppMaster, and use Actor Supervision
tree for error handling.
Master Cluster
HA Design
11
WorkerWorkerWorker
Master
standby
Master
Standby
Master
State
Gossip
Architecture - Master HA (no SPOF)
• Akka Cluster for a centerless HA system
• Conflict free data types(CRDT) for consistency
CRDT Data type example:
Decentralized: Not rely on single central meta server
leader
12
Feature Highlights
Akka/Akka-
stream/Storm
compatible
Throughput
14 million/s (*)
2ms Latency(*)
Exactly-once Dynamic DAG
Out of Order
Message
Flexible DSL
DAG
Visualization
Internet of
Thing
function
usability
[*] Test environment: Intel® Xeon™ Processor E5-2690, 4 nodes, each node has 32 CPU cores, and 64GB memory, 10GbE network.
We use default configuration of Gearpump 0.7 release. See backup page for more details.
We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark) (tested carried by Intel team)
13
14
Three steps to use it
1. Download binary from http://gearpump.io
2. Submit jar by UI
3. Monitor Status
15
Application Submission Flow
Master
Workers
1. Submit a Jar
Master
Workers
AppMaster 2. Create AppMaster
Master
Workers
AppMaster
YARN
Master
Workers
AppMaster Executor Executor Executor
1 2
43
154. Report Executor to AppMaster
YARN or without YARN
16
Low level Graph API - WordCount
val context = new ClientContext()
val split = Processor[Split](splitParallism)
val sum = Processor[Sum](sumParallism)
val app = StreamApplication("wordCount", Graph(split ~> sum), UserConfig.empty)
val appId = context.submit(app)
context.close()
class Split(taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) {
override def onNext(msg : Message) : Unit = { /* split the line */ }
}
class Sum (taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) {
val count = /**count of words **/
override def onNext(msg : Message) : Unit = {/* do aggregation on word*/}
}
Scala Java
17
High Level DSL API - WordCount
val context = ClientContext()
val app = new StreamApp("dsl", context)
val data = "This is a good start, bingo!! bingo!!"
app.fromCollection(data.lines)
// word => (word, count = 1)
.flatMap(line => line.split("[s]+")).map((_, 1))
// (word, count1), (word, count2) => (word, count1 + count2)
.groupByKey().sum.log
val appId = context.submit(app)
context.close()
18
Akka-stream API – WordCount
implicit val system = ActorSystem("akka-test")
implicit val materializer = new GearpumpMaterializer(system)
val echo = system.actorOf(Props(new Echo()))
val sink = Sink.actorRef(echo, "COMPLETE")
val source = GearSource.from[String](new CollectionDataSource(lines))
source.mapConcat{line => line.split(" ").toList}
.groupBy2(x=>x)
.map(word => (word, 1))
.reduce {(a, b) => (a._1, a._2 + b._2)}
.log("word-count").runWith(sink)
Available at branch https://github.com/gearpump/gearpump/tree/akkastream
19
DAG Page
UI Portal - DAG Visualization
Track global min-Clock
of all message DAG:
• Node size reflect throughput
• Edge width represents flow rate
• Red node means something goes wrong
19
20
UI Portal – Processor Detail
Processor Page
Data skew distribution Task throughput and latency
21
22
Throughput and Latency
Throughput: 11 million message/second
Latency: 17ms on full load
SOL Shuffle test
32 tasks->32 tasks
[*] Test environment: Intel® Xeon™ Processor E5-2680, 4 nodes, each node has 32 CPU cores, and 64GB memory, 10GbE network. (tested carried by Intel team)
We use default configuration of Gearpump 0.2 release. We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark)
23
Scalability
• Test run on 100 nodes(*) and 3000 tasks
• Gearpump performance scales:
100 nodes
[*] We use 8 machines to simulate 100 worker nodes
Test environment: Intel® Xeon™ Processor E5-2680, each node has 32 CPU cores, and 64GB memory, 10GbE network. (tested carried by Intel team)
We use default configuration of Gearpump 0.3.5 release. We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark)
24
Fault-Tolerance: Recovery time
[*]: Recovery time is the time interval between: a) failure happen b) all tasks in topology resume processing data.
91 worker nodes, 1000 tasks
Test environment: Intel® Xeon™ Processor E5-2680 91 worker nodes, 1000 tasks (We use 7 machines to simulate 91 worker nodes). Each node has 32 CPU cores, and 64GB
memory, 10GbE network. We use default configuration of Gearpump 0.3.5 release. We use the SOL workload (https://github.com/intel-hadoop/storm-benchmark) (by Intel team)
25
High performance Messaging Layer
• Akka remote message has a big overhead, (sender + receiver address)
• Reduce 95% overhead (400 bytes to ~20 bytes)
Effective
batching
convert from
short address
convert to short address
Sync with other executors
26
Effective batching
Network Idle: Flush as fast as we can
Network Busy: Smart batching until the network is open again.
This feature is ported from Storm-297
Network Bandwidth
Doubled
For 100 byte per message
Test environment: Same as storm-297
27
High performance flow Control
Pass back-pressure level-by-level
About ~1% throughput impact
TaskTaskTask TaskTaskTask TaskTaskTask TaskTaskTask
Back-pressure
Sliding window
Another option(not used): big-loop-feedback flow control
1. NO central ack nodes
2. Each level knows
network status, thus
can optimize the
network at best
28
29
What is a good use case for Gearpump?
When you want exactly-once message processing, with millisecond
latency.
When you want to integrate with Akka and Akka-stream transparently.
When you want dynamic modification of online DAG
When you want to connect with IoT edge device, location tranparency.
When you want a simple scheduler to distribute customized
application, like collecting logs, distributed cron…
When you want to use Monoid state like Count-Min Sketch…
Besides, it can integrate with: YARN, Kafka, Storm and etc..
30
IoT Transparent Cloud
Location transparent. Unified programming model across the boundary.
log
Data Center
dag on
device side
Target Problem Large gap between edge devices and data center
case: Intelligent Traffic System, 3000 traffic lights, travel time, overspeed detection…
31
Exactly-once: Financial use cases
Target Problem both real-time and accuracy are important
realtime
Stock index
Crawlers
Process Alerts Actions Reports
Rules
Transaction
AccountOther
data
source
Programing trading
32
Delete
Transformers: Dynamic DAG
Add
change parallelism online to scale out without message loss
add/remove source/sink processor dynamically
B
Target Problem No existing way to manipulate the DAG on the fly
Each Processor can has its own independent jar
Replace
It can
33
Eve: Online Machine Learning
• Decide immediately based online learning result
sensor
Learn
Predict
Decide
Input
Output
Target Problem ML train and predict online for real-time decision support.
TAP analytics platform integration: http://trustedanalytics.org/
34
35
General ideas
State
Minclock service
Replayable
Source DAG
MinClock service track the min
timestamp of all pending messages in the
system now and future
Message(timestamp)
Every message in the system
Has a application
timestamp(message birth time)
Normal Flow
36
General ideas
State
Minclock service
Replayable
Source
Checkpoint Store
Message(timestamp)
④Exactly-once State can be reconstructed by message replay:
①Detect Message loss at Tp
② clock pause at Tp
③ replay from Tp
Recovery Flow
DAG
37
1. Detect Message loss
2. Pause the clock at Tp when message is lost
3. Replay from clock Tp
4. Exactly-once
38
Detect Failure in time
AckRequest and Ack to
detect Message loss:
Easy to trouble-shoot
When An error happen, we know
 When
 Where
 Why
Master
AppMaster
Executor
Task
Failure
Failure
Failure
39
Executor Executor
AppMaster
TaskTaskTask
Store
Source
Global clock
service
① error
detected
②Fence zombie
Recover the runtime when machine crashed
Use dynamic session ID to
fence zombies
Send message
1. Quarantine
40
Executor
AppMaster
Task
Store
Source
Replay
Global clock
service
① error
detected
②isolate zombie
Executor
Task
③ Recover
DAG Recovery: Quarantine and Recover
2. Recover the executor JVM, and replay message
Send message
41
1. Detect Message loss
2. Pause the clock at Tp when message is lost
3. Replay from clock Tp
4. Exactly-once
42
Application’s Clock Service
Definition: Task min-clock is
Minimum of (
min timestamp of pending-messages in current task
Task min-Clock of all upstream tasks
)
Report task min-clock
Level Clock
A
B
C
D
E
Later
Earlier
1000
800
600
400
Ever incremental
43
1. Detect Message loss
2. Pause the clock at Tp when message is lost
3. Replay from clock Tp
4. Exactly-once
44
Source-based Message Replay
Normal Flow
Replay from the very-beginning source
Source
Like offset of kafka queue -> message timestamp
45
Source-based Message Replay
Replay from the very-beginning source
Global Clock Service
①Resolve offset with timestamp Tp②Replay from offset
Source
Recovery Flow
46
1. Detect Message loss
2. Clock pause at Tp when message is lost
3. Replay from clock Tp
4. Exactly-once
47
Checkpoint Store
DAG runtime
Key: Ensure State(t) only contains
message(timestamp <= t)
Exactly-once message processing
How?
append only checkpoint
48
Exactly-once message processing
Two states
State
Accept (t < Tc)
State
Accept all
Checkpoint Store
Streaming SystemMessages
Normal Flow
49
Exactly-once message processing
Two states
State
Accept (t < Tc)
State
Accept all
Checkpoint Store
Streaming SystemMessages
Recover checkpoint in failures
Recovery Flow
50
How to do Dynamic DAG?
Multiple-Version DAG
A B C
B’Message.time >= Tc
Target Problem: Replace processor B with B’ at time Tc
DAG(Version = 0) DAG(Version = 1)
transit
Message.time < Tc
NO message loss during the transition
A B C
51
52
Live demo
53
References
• 钟翔 大数据时代的软件架构范式:Reactive架构及Akka实践, 程序员期刊2015年2A期
• Gearpump whitepaper http://typesafe.com/blog/gearpump-real-time-streaming-engine-using-akka
• 吴甘沙 低延迟流处理系统的逆袭, 程序员期刊2013年10期
• Stonebraker http://cs.brown.edu/~ugur/8rulesSigRec.pdf
• https://github.com/intel-hadoop/gearpump
• Gearpump: https://github.com/intel-hadoop/gearpump
• http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
• https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-
three-cheap-machines
• Sqlstream http://www.sqlstream.com/customers/
• http://www.statsblogs.com/2014/05/19/a-general-introduction-to-stream-processing/
• http://www.statalgo.com/2014/05/28/stream-processing-with-messaging-systems/
• Gartner report on IOT http://www.zdnet.com/article/internet-of-things-devices-will-dwarf-number-
of-pcs-tablets-and-smartphones/
54
Legal Disclaimers
1 This document contains information on products, services and/or processes in development. All
information provided here is subject to change without notice. Contact your Intel representative
to obtain the latest forecast, schedule, specifications and roadmaps.
2 Intel technologies’ features and benefits depend on system configuration and may require
enabled hardware, software or service activation. Performance varies depending on system
configuration. No computer system can be absolutely secure. Check with your system
manufacturer or retailer or learn more at [intel.com].
3 Software and workloads used in performance tests may have been optimized for performance
only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
measured using specific computer systems, components, software, operations and functions. Any
change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products.
§ For more information go to http://www.intel.com/performance.
Intel, the Intel logo, Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2015 Intel Corporation
55
Latest Performance evaluation on Gearpump 0.7
This is the latest
performance test on
version 0.7. Please see
the embedded
document on the right
to see the configuration
details.
Strata Singapore: GearpumpReal time DAG-Processing with Akka at Scale

Mais conteúdo relacionado

Mais procurados

Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacApache Apex
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Monal Daxini
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...Codemotion Tel Aviv
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasFlink Forward
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex
 
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...Flink Forward
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasMonal Daxini
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingApache Apex
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormRan Silberman
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Bhupesh Chawda
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Julian Hyde
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 

Mais procurados (20)

Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
 
ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015 ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch -  Dynami...
Flink Forward SF 2017: David Hardwick, Sean Hester & David Brelloch - Dynami...
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
 
Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016Introduction to Apache Apex - CoDS 2016
Introduction to Apache Apex - CoDS 2016
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 

Destaque

Enterprise Mobility Transforming Public Service and Citizen Engagement
Enterprise Mobility Transforming Public Service and Citizen EngagementEnterprise Mobility Transforming Public Service and Citizen Engagement
Enterprise Mobility Transforming Public Service and Citizen EngagementSAP Asia Pacific
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai
 
8281 Gefahr der Überheblichkeit ....
8281   Gefahr der Überheblichkeit ....8281   Gefahr der Überheblichkeit ....
8281 Gefahr der Überheblichkeit ....Marianne Zipf
 
Lean Startup for AaltoES Summer of Startups
Lean Startup for AaltoES Summer of StartupsLean Startup for AaltoES Summer of Startups
Lean Startup for AaltoES Summer of StartupsMarko Taipale
 
Spring 2011 Current Projects
Spring 2011 Current ProjectsSpring 2011 Current Projects
Spring 2011 Current Projectsgrantml
 
JavaScript Unit Testing
JavaScript Unit TestingJavaScript Unit Testing
JavaScript Unit TestingKeir Bowden
 
Slide 04 - Otaciso, Digelvânia e Silvana
Slide 04 - Otaciso, Digelvânia e SilvanaSlide 04 - Otaciso, Digelvânia e Silvana
Slide 04 - Otaciso, Digelvânia e Silvanarafaelly04
 
Ana maria araujo calderon trabajo social redvolucion leidis
Ana maria araujo calderon trabajo social redvolucion leidisAna maria araujo calderon trabajo social redvolucion leidis
Ana maria araujo calderon trabajo social redvolucion leidisRedvolucionCesarNorte
 
Brightspace Spring Release 2016
Brightspace Spring Release 2016Brightspace Spring Release 2016
Brightspace Spring Release 2016D2L
 
3 ways to value your business
3 ways to value your business3 ways to value your business
3 ways to value your businessEquidam
 
GBI 2016 Asia - Hong Kong Launch Presentation 270916
GBI 2016 Asia - Hong Kong Launch Presentation 270916GBI 2016 Asia - Hong Kong Launch Presentation 270916
GBI 2016 Asia - Hong Kong Launch Presentation 270916Ipsos UK
 
Golden rules from online community facilitators
Golden rules from online  community facilitatorsGolden rules from online  community facilitators
Golden rules from online community facilitatorsMichael Norton
 
20151205japanr
20151205japanr20151205japanr
20151205japanrMed_KU
 
Iconic AP images of the 20th century
Iconic AP images of the 20th centuryIconic AP images of the 20th century
Iconic AP images of the 20th centuryguimera
 
Android app development – what you should not
Android app development – what you should notAndroid app development – what you should not
Android app development – what you should notmanisha ips
 
How Reebok Uses Employee Generated Content to Amplify Advocacy
How Reebok Uses Employee Generated Content to Amplify Advocacy How Reebok Uses Employee Generated Content to Amplify Advocacy
How Reebok Uses Employee Generated Content to Amplify Advocacy SocialChorus
 

Destaque (20)

Enterprise Mobility Transforming Public Service and Citizen Engagement
Enterprise Mobility Transforming Public Service and Citizen EngagementEnterprise Mobility Transforming Public Service and Citizen Engagement
Enterprise Mobility Transforming Public Service and Citizen Engagement
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
8281 Gefahr der Überheblichkeit ....
8281   Gefahr der Überheblichkeit ....8281   Gefahr der Überheblichkeit ....
8281 Gefahr der Überheblichkeit ....
 
Lean Startup for AaltoES Summer of Startups
Lean Startup for AaltoES Summer of StartupsLean Startup for AaltoES Summer of Startups
Lean Startup for AaltoES Summer of Startups
 
Spring 2011 Current Projects
Spring 2011 Current ProjectsSpring 2011 Current Projects
Spring 2011 Current Projects
 
딥 시큐리티
딥 시큐리티딥 시큐리티
딥 시큐리티
 
Open Government Data - Zwei Beispiele der Community-Einbindung
Open Government Data - Zwei Beispiele der Community-EinbindungOpen Government Data - Zwei Beispiele der Community-Einbindung
Open Government Data - Zwei Beispiele der Community-Einbindung
 
JavaScript Unit Testing
JavaScript Unit TestingJavaScript Unit Testing
JavaScript Unit Testing
 
CA concepts, principles and practices Kitui
CA concepts, principles and practices   KituiCA concepts, principles and practices   Kitui
CA concepts, principles and practices Kitui
 
Slide 04 - Otaciso, Digelvânia e Silvana
Slide 04 - Otaciso, Digelvânia e SilvanaSlide 04 - Otaciso, Digelvânia e Silvana
Slide 04 - Otaciso, Digelvânia e Silvana
 
Ana maria araujo calderon trabajo social redvolucion leidis
Ana maria araujo calderon trabajo social redvolucion leidisAna maria araujo calderon trabajo social redvolucion leidis
Ana maria araujo calderon trabajo social redvolucion leidis
 
Brightspace Spring Release 2016
Brightspace Spring Release 2016Brightspace Spring Release 2016
Brightspace Spring Release 2016
 
3 ways to value your business
3 ways to value your business3 ways to value your business
3 ways to value your business
 
Инновации в рекламном бизнесе
Инновации в рекламном бизнесеИнновации в рекламном бизнесе
Инновации в рекламном бизнесе
 
GBI 2016 Asia - Hong Kong Launch Presentation 270916
GBI 2016 Asia - Hong Kong Launch Presentation 270916GBI 2016 Asia - Hong Kong Launch Presentation 270916
GBI 2016 Asia - Hong Kong Launch Presentation 270916
 
Golden rules from online community facilitators
Golden rules from online  community facilitatorsGolden rules from online  community facilitators
Golden rules from online community facilitators
 
20151205japanr
20151205japanr20151205japanr
20151205japanr
 
Iconic AP images of the 20th century
Iconic AP images of the 20th centuryIconic AP images of the 20th century
Iconic AP images of the 20th century
 
Android app development – what you should not
Android app development – what you should notAndroid app development – what you should not
Android app development – what you should not
 
How Reebok Uses Employee Generated Content to Amplify Advocacy
How Reebok Uses Employee Generated Content to Amplify Advocacy How Reebok Uses Employee Generated Content to Amplify Advocacy
How Reebok Uses Employee Generated Content to Amplify Advocacy
 

Semelhante a Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale

Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureRobert Metzger
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applicationsDing Li
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseKostas Tzoumas
 
Nagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
Nagios Conference 2007 | Nagios in very large Environments by Werner NeunteuflNagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
Nagios Conference 2007 | Nagios in very large Environments by Werner NeunteuflNETWAYS
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Jacob Maes
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...confluent
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEventsNeil Avery
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsThomas Weise
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Comsysto Reply GmbH
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...netvis
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Distributed caching-computing v3.8
Distributed caching-computing v3.8Distributed caching-computing v3.8
Distributed caching-computing v3.8Rahul Gupta
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 

Semelhante a Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale (20)

Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
Nagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
Nagios Conference 2007 | Nagios in very large Environments by Werner NeunteuflNagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
Nagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Our Methodology & Benefits
Our Methodology & BenefitsOur Methodology & Benefits
Our Methodology & Benefits
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Distributed caching-computing v3.8
Distributed caching-computing v3.8Distributed caching-computing v3.8
Distributed caching-computing v3.8
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 

Último

Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 

Último (20)

Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 

Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale

  • 1. 1 GearpumpReal time DAG-Processing with Akka at Scale Strata Singapore 2015 Sean Zhong Xiang.zhong@intel.com, Intel Software
  • 2. 2 What is Gearpump • Akka based lightweight Real time data processing platform. • Apache License http://gearpump.io version 0.7 • Akka: • Communication, concurrency, Isolation, and fault-tolerant Simple and Powerful Message level streaming Long running daemons What is Akka?
  • 3. 3 What is Akka? • Micro-service(Actor) oriented. • Message Driven • Lock-free • Location-transparent It is like our human society, driven by message Which can scale to 7 billion population!
  • 4. 4 Micro-service oriented higher abstraction • Break your application into Micro services instead of object. • Throw away locks • Use Immutable Async message to exchange information between micro-service instead of shared object.
  • 5. 5 Gearpump in Big Data Stack store visualization batch stream SQL Catalyst StreamSQL Impala Cluster manager monitor/alert/notify Machine learning Graphx Cloudera Manager Storage Engine Analytics Visualization & management storm Data explore Gearpump
  • 6. 6 Why another streaming platform? • The requirements are not fully met. • A higher abstraction like micro-service oriented can largely simplify the problem.
  • 7. 7 What we want • Meet The 8 Requirements of Real-Time Stream Processing (2006) 7 Flexible Volume Speed Accuracy Visual Any where Any size Any source Any use case dynamic DAG ②StreamSQL High throughput ⑦Scale linearly ①In-Stream Zero latency ⑥HA ⑧Responsive Exactly-once ③Message loss/delay/out of order ④Predictable Easy to debug WYSWYG
  • 8. 8
  • 9. 9 DAG representation and API Graph(A~>B~>C~>D, B~>E~>D) Low level Graph API Syntax: A B C D E Processor Processor Processor Processor Processor DAG Shuffle Field grouping Field grouping
  • 10. 10 Architecture - Actor Hierarchy Client Hook in and query state As general service YARN Each App has one isolated AppMaster, and use Actor Supervision tree for error handling. Master Cluster HA Design
  • 11. 11 WorkerWorkerWorker Master standby Master Standby Master State Gossip Architecture - Master HA (no SPOF) • Akka Cluster for a centerless HA system • Conflict free data types(CRDT) for consistency CRDT Data type example: Decentralized: Not rely on single central meta server leader
  • 12. 12 Feature Highlights Akka/Akka- stream/Storm compatible Throughput 14 million/s (*) 2ms Latency(*) Exactly-once Dynamic DAG Out of Order Message Flexible DSL DAG Visualization Internet of Thing function usability [*] Test environment: Intel® Xeon™ Processor E5-2690, 4 nodes, each node has 32 CPU cores, and 64GB memory, 10GbE network. We use default configuration of Gearpump 0.7 release. See backup page for more details. We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark) (tested carried by Intel team)
  • 13. 13
  • 14. 14 Three steps to use it 1. Download binary from http://gearpump.io 2. Submit jar by UI 3. Monitor Status
  • 15. 15 Application Submission Flow Master Workers 1. Submit a Jar Master Workers AppMaster 2. Create AppMaster Master Workers AppMaster YARN Master Workers AppMaster Executor Executor Executor 1 2 43 154. Report Executor to AppMaster YARN or without YARN
  • 16. 16 Low level Graph API - WordCount val context = new ClientContext() val split = Processor[Split](splitParallism) val sum = Processor[Sum](sumParallism) val app = StreamApplication("wordCount", Graph(split ~> sum), UserConfig.empty) val appId = context.submit(app) context.close() class Split(taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) { override def onNext(msg : Message) : Unit = { /* split the line */ } } class Sum (taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) { val count = /**count of words **/ override def onNext(msg : Message) : Unit = {/* do aggregation on word*/} } Scala Java
  • 17. 17 High Level DSL API - WordCount val context = ClientContext() val app = new StreamApp("dsl", context) val data = "This is a good start, bingo!! bingo!!" app.fromCollection(data.lines) // word => (word, count = 1) .flatMap(line => line.split("[s]+")).map((_, 1)) // (word, count1), (word, count2) => (word, count1 + count2) .groupByKey().sum.log val appId = context.submit(app) context.close()
  • 18. 18 Akka-stream API – WordCount implicit val system = ActorSystem("akka-test") implicit val materializer = new GearpumpMaterializer(system) val echo = system.actorOf(Props(new Echo())) val sink = Sink.actorRef(echo, "COMPLETE") val source = GearSource.from[String](new CollectionDataSource(lines)) source.mapConcat{line => line.split(" ").toList} .groupBy2(x=>x) .map(word => (word, 1)) .reduce {(a, b) => (a._1, a._2 + b._2)} .log("word-count").runWith(sink) Available at branch https://github.com/gearpump/gearpump/tree/akkastream
  • 19. 19 DAG Page UI Portal - DAG Visualization Track global min-Clock of all message DAG: • Node size reflect throughput • Edge width represents flow rate • Red node means something goes wrong 19
  • 20. 20 UI Portal – Processor Detail Processor Page Data skew distribution Task throughput and latency
  • 21. 21
  • 22. 22 Throughput and Latency Throughput: 11 million message/second Latency: 17ms on full load SOL Shuffle test 32 tasks->32 tasks [*] Test environment: Intel® Xeon™ Processor E5-2680, 4 nodes, each node has 32 CPU cores, and 64GB memory, 10GbE network. (tested carried by Intel team) We use default configuration of Gearpump 0.2 release. We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark)
  • 23. 23 Scalability • Test run on 100 nodes(*) and 3000 tasks • Gearpump performance scales: 100 nodes [*] We use 8 machines to simulate 100 worker nodes Test environment: Intel® Xeon™ Processor E5-2680, each node has 32 CPU cores, and 64GB memory, 10GbE network. (tested carried by Intel team) We use default configuration of Gearpump 0.3.5 release. We use the SOL workload, message size 100 bytes, (https://github.com/intel-hadoop/storm-benchmark)
  • 24. 24 Fault-Tolerance: Recovery time [*]: Recovery time is the time interval between: a) failure happen b) all tasks in topology resume processing data. 91 worker nodes, 1000 tasks Test environment: Intel® Xeon™ Processor E5-2680 91 worker nodes, 1000 tasks (We use 7 machines to simulate 91 worker nodes). Each node has 32 CPU cores, and 64GB memory, 10GbE network. We use default configuration of Gearpump 0.3.5 release. We use the SOL workload (https://github.com/intel-hadoop/storm-benchmark) (by Intel team)
  • 25. 25 High performance Messaging Layer • Akka remote message has a big overhead, (sender + receiver address) • Reduce 95% overhead (400 bytes to ~20 bytes) Effective batching convert from short address convert to short address Sync with other executors
  • 26. 26 Effective batching Network Idle: Flush as fast as we can Network Busy: Smart batching until the network is open again. This feature is ported from Storm-297 Network Bandwidth Doubled For 100 byte per message Test environment: Same as storm-297
  • 27. 27 High performance flow Control Pass back-pressure level-by-level About ~1% throughput impact TaskTaskTask TaskTaskTask TaskTaskTask TaskTaskTask Back-pressure Sliding window Another option(not used): big-loop-feedback flow control 1. NO central ack nodes 2. Each level knows network status, thus can optimize the network at best
  • 28. 28
  • 29. 29 What is a good use case for Gearpump? When you want exactly-once message processing, with millisecond latency. When you want to integrate with Akka and Akka-stream transparently. When you want dynamic modification of online DAG When you want to connect with IoT edge device, location tranparency. When you want a simple scheduler to distribute customized application, like collecting logs, distributed cron… When you want to use Monoid state like Count-Min Sketch… Besides, it can integrate with: YARN, Kafka, Storm and etc..
  • 30. 30 IoT Transparent Cloud Location transparent. Unified programming model across the boundary. log Data Center dag on device side Target Problem Large gap between edge devices and data center case: Intelligent Traffic System, 3000 traffic lights, travel time, overspeed detection…
  • 31. 31 Exactly-once: Financial use cases Target Problem both real-time and accuracy are important realtime Stock index Crawlers Process Alerts Actions Reports Rules Transaction AccountOther data source Programing trading
  • 32. 32 Delete Transformers: Dynamic DAG Add change parallelism online to scale out without message loss add/remove source/sink processor dynamically B Target Problem No existing way to manipulate the DAG on the fly Each Processor can has its own independent jar Replace It can
  • 33. 33 Eve: Online Machine Learning • Decide immediately based online learning result sensor Learn Predict Decide Input Output Target Problem ML train and predict online for real-time decision support. TAP analytics platform integration: http://trustedanalytics.org/
  • 34. 34
  • 35. 35 General ideas State Minclock service Replayable Source DAG MinClock service track the min timestamp of all pending messages in the system now and future Message(timestamp) Every message in the system Has a application timestamp(message birth time) Normal Flow
  • 36. 36 General ideas State Minclock service Replayable Source Checkpoint Store Message(timestamp) ④Exactly-once State can be reconstructed by message replay: ①Detect Message loss at Tp ② clock pause at Tp ③ replay from Tp Recovery Flow DAG
  • 37. 37 1. Detect Message loss 2. Pause the clock at Tp when message is lost 3. Replay from clock Tp 4. Exactly-once
  • 38. 38 Detect Failure in time AckRequest and Ack to detect Message loss: Easy to trouble-shoot When An error happen, we know  When  Where  Why Master AppMaster Executor Task Failure Failure Failure
  • 39. 39 Executor Executor AppMaster TaskTaskTask Store Source Global clock service ① error detected ②Fence zombie Recover the runtime when machine crashed Use dynamic session ID to fence zombies Send message 1. Quarantine
  • 40. 40 Executor AppMaster Task Store Source Replay Global clock service ① error detected ②isolate zombie Executor Task ③ Recover DAG Recovery: Quarantine and Recover 2. Recover the executor JVM, and replay message Send message
  • 41. 41 1. Detect Message loss 2. Pause the clock at Tp when message is lost 3. Replay from clock Tp 4. Exactly-once
  • 42. 42 Application’s Clock Service Definition: Task min-clock is Minimum of ( min timestamp of pending-messages in current task Task min-Clock of all upstream tasks ) Report task min-clock Level Clock A B C D E Later Earlier 1000 800 600 400 Ever incremental
  • 43. 43 1. Detect Message loss 2. Pause the clock at Tp when message is lost 3. Replay from clock Tp 4. Exactly-once
  • 44. 44 Source-based Message Replay Normal Flow Replay from the very-beginning source Source Like offset of kafka queue -> message timestamp
  • 45. 45 Source-based Message Replay Replay from the very-beginning source Global Clock Service ①Resolve offset with timestamp Tp②Replay from offset Source Recovery Flow
  • 46. 46 1. Detect Message loss 2. Clock pause at Tp when message is lost 3. Replay from clock Tp 4. Exactly-once
  • 47. 47 Checkpoint Store DAG runtime Key: Ensure State(t) only contains message(timestamp <= t) Exactly-once message processing How? append only checkpoint
  • 48. 48 Exactly-once message processing Two states State Accept (t < Tc) State Accept all Checkpoint Store Streaming SystemMessages Normal Flow
  • 49. 49 Exactly-once message processing Two states State Accept (t < Tc) State Accept all Checkpoint Store Streaming SystemMessages Recover checkpoint in failures Recovery Flow
  • 50. 50 How to do Dynamic DAG? Multiple-Version DAG A B C B’Message.time >= Tc Target Problem: Replace processor B with B’ at time Tc DAG(Version = 0) DAG(Version = 1) transit Message.time < Tc NO message loss during the transition A B C
  • 51. 51
  • 53. 53 References • 钟翔 大数据时代的软件架构范式:Reactive架构及Akka实践, 程序员期刊2015年2A期 • Gearpump whitepaper http://typesafe.com/blog/gearpump-real-time-streaming-engine-using-akka • 吴甘沙 低延迟流处理系统的逆袭, 程序员期刊2013年10期 • Stonebraker http://cs.brown.edu/~ugur/8rulesSigRec.pdf • https://github.com/intel-hadoop/gearpump • Gearpump: https://github.com/intel-hadoop/gearpump • http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/ • https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second- three-cheap-machines • Sqlstream http://www.sqlstream.com/customers/ • http://www.statsblogs.com/2014/05/19/a-general-introduction-to-stream-processing/ • http://www.statalgo.com/2014/05/28/stream-processing-with-messaging-systems/ • Gartner report on IOT http://www.zdnet.com/article/internet-of-things-devices-will-dwarf-number- of-pcs-tablets-and-smartphones/
  • 54. 54 Legal Disclaimers 1 This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. 2 Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com]. 3 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. § For more information go to http://www.intel.com/performance. Intel, the Intel logo, Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2015 Intel Corporation
  • 55. 55 Latest Performance evaluation on Gearpump 0.7 This is the latest performance test on version 0.7. Please see the embedded document on the right to see the configuration details.