SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
Realtime Statistics based on Apache Storm and RocketMQ
- Xin Wang
Dec.16, 2017, Shenzhen, Apache RocketMQ Meetup
Xin Wang
• Apache Storm Committer & PMC member
• Five years distributed system experience
• Love open source & community
• Focus on distributed technologies, especially stream processing
• https://github.com/vesense
Streaming and batch which come from different worlds,
use different ways to solve different problems.
- Xin
Index
Part-1: Streaming Ecosystem
Part-2: Stateful Statistics based on Storm & RocketMQ
Part-3: Best Practices
01
Streaming Ecosystem
The Streaming Ecosystem
Collector
Messaging
SQL
Streaming-
Connector
Streaming-
Connector
Storage
APP
Streaming-
State
Schema-
Registry
CEP ML ...
Stream	API
Runtime
Deploy:	Local,	Cluster,	Cloud
Streaming-
Manager
Streaming
...
Messaging:	apache/kafka,rocketmq,pulsar
Streaming:	apache/storm,flink,spark-streaming,kafka-streams
Schema-Registry:	hortonworks/registry,	confluentinc/schema-registry
Streaming-Manager:	hortonworks/streamline
Which one is better for me?
• Simple API
• Fault-tolerant/Stable
• Scalable
• Performance(high throughput & low latency)
• Guarantees: at-least-once/exactly-once
• Mature
• Ecosystem
• Operation and Maintenance
• Support
• Code
Storm 2.0
• Port Clojure to Java
• Unified Stream API
• Storm-SQL Improvements
• Metrics V2
• Threading model Redesign
• Lambda Expression Support - bolt: `tuple -> System.out.println(tuple)`
• Apache Beam Runner
• Worker Classloader Isolation
• Dynamic Topology Updates
• ......
02
Stateful Statistics based on Storm & RocketMQ
Realtime Architecture
Apache	HBase
MySQL
S
S
P R
Source	
Topic
Retry
Topic
Sink	Topic
Apache	RocketMQ
Apache	RocketMQ
Apache	Storm
Stateful Statistics Challenges
1 2 3
1 2 3
3 2 1
time
1 2
3
~
loss
duplicating
out-of-order
mutex
Q:
complex, state machine + streaming
open source middleware?
A:
• loss -> compensating
• s1 -> s2 on condition when e3
• duplicating	->	idempotent
• exists(key)
• out-of-order -> compensating+idempotent
• mutex -> +/-
• s1++ && s2--
1 2 3 expected
Stateful Event Counting: Alien
Alien? a stateful event counting middleware.
• Support event loss, duplicating, out-of-order, mutex
• Support time/event Window API
• Support integrating with streaming systems
• Support dimension changing
• Support sync/async snapshot storage
• Support user defined State, Snapshot Serializer
• Support state REST interfaces
• ...
Alien	alien	=	Alien.createAlien()
.withState(new	LocalMemoryState())
.withWindow(new	TimeWindow(2){
@Override
public	void	accept(Map<String,View>	views)	{
View	view	=	views.get(“report”);
List<Row>	rows	=	view.getRows();
for	(Row	r :	rows)	{
println(r.getDimensions()	+	“->”	+	r.getMetrics());
}
}
});	
alien.putEvent(new	Event("name","key") );
03
Best Practices
Best Practices
• Worker heavy GCs:
• worker restart, heartbeat timeout, bad performance -> take care of your heap memory usage. Do you use local
caches? Reasonable JVM options? e.g. -XX:CMSInitiatingOccupancyFraction
• Topology design vs performance
• bad performance -> put the lightweight logics into the same bolt/operator
• Too many executors/tasks
• high cluster CPU load, bad performance -> tuning the number of threads:
for CPU-intensive task: task parallelism <= vcore,
for IO-intensive task: vcore <= task parallelism <= N*vcore.
Warn: runnable sun.nio.ch.EPollArrayWrapper.epollWait
CPU: user cpu or sys cpu? Load: runnable task or io task?
Amdahl law: Non-Parallelizable + Parallelizable
• Data hot point / data skew
• some nodes have bad performance -> choose the right hash key, two-phase aggregation, or use micro-batch
• Big objects serialization:
• bad performance -> reduce the size of objects, and enable kryo registry(from 55ms to 11ms after kryo registry)
• Too many logs:
• bad performance -> never log the logs unnecessary
Data Hot Point / Data Skew
Q:
partition = hash(key) % N
A:
• Choose the right hash key
• mapreduce from history?
• key == null?
• Two-phase aggregation
• Use micro-batch / local-reduce
S
P
P
S
P
P
S
P
P
G
k1
k2
k1+salt1
k1+salt2
k1
k2micro-batch
num(k1)
num(P)
num(S)
RocketMQ-Streaming Integration
RocketMQ-Storm: https://github.com/apache/storm/tree/master/external/storm-rocketmq
• RocketMQSpout - Now only RocketMQ push mode supported, pull mode is in the plan. The default Deserializer is
StringScheme, you can override the value by setting `RocketMQConfig.SCHEME`.
• RocketMQBolt - Async sending by default, or you can change the value by invoking `withAsync(boolean async)`
• RocketMQState - For users using Storm Trident API
• TopicSelector - Selecting a topic based on the input Storm tuple
• TupleToMessageMapper - Mapping a Storm tuple to a RocketMQ message, you can implement the
MessageBodySerializer interface to serialize the message body. The default implementation of MessageBodySerializer
is `body.toString().getBytes(StandardCharsets.UTF_8)`
• MessageRetryManager - Retry policy for failed messages
RocketMQ-Spark: https://github.com/apache/rocketmq-externals/tree/master/rocketmq-spark
RocketMQ-Flink: Coming soon
RocketMQ-Avro: Coming soon
OpenMessaging-Streaming

Mais conteúdo relacionado

Mais procurados

Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...HostedbyConfluent
 
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...StreamNative
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Alpha Five v10.NEW APPLICATION SERVER. CODELESS AJAX
Alpha Five v10.NEW APPLICATION SERVER. CODELESS AJAXAlpha Five v10.NEW APPLICATION SERVER. CODELESS AJAX
Alpha Five v10.NEW APPLICATION SERVER. CODELESS AJAXRichard Rabins
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safeconfluent
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!makker_nl
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...HostedbyConfluent
 
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...HostedbyConfluent
 
Webinar Slides: Real time Recommendations with Redis, Java and Websockets
Webinar Slides: Real time Recommendations with Redis, Java and WebsocketsWebinar Slides: Real time Recommendations with Redis, Java and Websockets
Webinar Slides: Real time Recommendations with Redis, Java and WebsocketsRedis Labs
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleApache Kafka TLV
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer confluent
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replicationVenu Ryali
 
LCA13: Web Server and Caching Technologies
LCA13: Web Server and Caching TechnologiesLCA13: Web Server and Caching Technologies
LCA13: Web Server and Caching TechnologiesLinaro
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using KafkaVenu Ryali
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using KafkaAkash Vacher
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafkaconfluent
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesVenu Ryali
 
Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018Ying Xu
 

Mais procurados (20)

Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
 
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Alpha Five v10.NEW APPLICATION SERVER. CODELESS AJAX
Alpha Five v10.NEW APPLICATION SERVER. CODELESS AJAXAlpha Five v10.NEW APPLICATION SERVER. CODELESS AJAX
Alpha Five v10.NEW APPLICATION SERVER. CODELESS AJAX
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safe
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
 
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
 
Webinar Slides: Real time Recommendations with Redis, Java and Websockets
Webinar Slides: Real time Recommendations with Redis, Java and WebsocketsWebinar Slides: Real time Recommendations with Redis, Java and Websockets
Webinar Slides: Real time Recommendations with Redis, Java and Websockets
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replication
 
LCA13: Web Server and Caching Technologies
LCA13: Web Server and Caching TechnologiesLCA13: Web Server and Caching Technologies
LCA13: Web Server and Caching Technologies
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using Kafka
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using Kafka
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and Kubernetes
 
Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

Semelhante a 3.2 Streaming and Messaging

HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Futurercastain
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorStéphane Maldini
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and MessagingXin Wang
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQXin Wang
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...NETWAYS
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...Databricks
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsDataWorks Summit/Hadoop Summit
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 

Semelhante a 3.2 Streaming and Messaging (20)

HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Future
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and Reactor
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Scalable Web Apps
Scalable Web AppsScalable Web Apps
Scalable Web Apps
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 

Último

VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...kalichargn70th171
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxAS Design & AST.
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 

Último (20)

VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptx
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 

3.2 Streaming and Messaging

  • 1. Realtime Statistics based on Apache Storm and RocketMQ - Xin Wang Dec.16, 2017, Shenzhen, Apache RocketMQ Meetup
  • 2. Xin Wang • Apache Storm Committer & PMC member • Five years distributed system experience • Love open source & community • Focus on distributed technologies, especially stream processing • https://github.com/vesense
  • 3. Streaming and batch which come from different worlds, use different ways to solve different problems. - Xin
  • 4. Index Part-1: Streaming Ecosystem Part-2: Stateful Statistics based on Storm & RocketMQ Part-3: Best Practices
  • 6. The Streaming Ecosystem Collector Messaging SQL Streaming- Connector Streaming- Connector Storage APP Streaming- State Schema- Registry CEP ML ... Stream API Runtime Deploy: Local, Cluster, Cloud Streaming- Manager Streaming ... Messaging: apache/kafka,rocketmq,pulsar Streaming: apache/storm,flink,spark-streaming,kafka-streams Schema-Registry: hortonworks/registry, confluentinc/schema-registry Streaming-Manager: hortonworks/streamline
  • 7. Which one is better for me? • Simple API • Fault-tolerant/Stable • Scalable • Performance(high throughput & low latency) • Guarantees: at-least-once/exactly-once • Mature • Ecosystem • Operation and Maintenance • Support • Code
  • 8. Storm 2.0 • Port Clojure to Java • Unified Stream API • Storm-SQL Improvements • Metrics V2 • Threading model Redesign • Lambda Expression Support - bolt: `tuple -> System.out.println(tuple)` • Apache Beam Runner • Worker Classloader Isolation • Dynamic Topology Updates • ......
  • 9. 02 Stateful Statistics based on Storm & RocketMQ
  • 11. Stateful Statistics Challenges 1 2 3 1 2 3 3 2 1 time 1 2 3 ~ loss duplicating out-of-order mutex Q: complex, state machine + streaming open source middleware? A: • loss -> compensating • s1 -> s2 on condition when e3 • duplicating -> idempotent • exists(key) • out-of-order -> compensating+idempotent • mutex -> +/- • s1++ && s2-- 1 2 3 expected
  • 12. Stateful Event Counting: Alien Alien? a stateful event counting middleware. • Support event loss, duplicating, out-of-order, mutex • Support time/event Window API • Support integrating with streaming systems • Support dimension changing • Support sync/async snapshot storage • Support user defined State, Snapshot Serializer • Support state REST interfaces • ... Alien alien = Alien.createAlien() .withState(new LocalMemoryState()) .withWindow(new TimeWindow(2){ @Override public void accept(Map<String,View> views) { View view = views.get(“report”); List<Row> rows = view.getRows(); for (Row r : rows) { println(r.getDimensions() + “->” + r.getMetrics()); } } }); alien.putEvent(new Event("name","key") );
  • 14. Best Practices • Worker heavy GCs: • worker restart, heartbeat timeout, bad performance -> take care of your heap memory usage. Do you use local caches? Reasonable JVM options? e.g. -XX:CMSInitiatingOccupancyFraction • Topology design vs performance • bad performance -> put the lightweight logics into the same bolt/operator • Too many executors/tasks • high cluster CPU load, bad performance -> tuning the number of threads: for CPU-intensive task: task parallelism <= vcore, for IO-intensive task: vcore <= task parallelism <= N*vcore. Warn: runnable sun.nio.ch.EPollArrayWrapper.epollWait CPU: user cpu or sys cpu? Load: runnable task or io task? Amdahl law: Non-Parallelizable + Parallelizable • Data hot point / data skew • some nodes have bad performance -> choose the right hash key, two-phase aggregation, or use micro-batch • Big objects serialization: • bad performance -> reduce the size of objects, and enable kryo registry(from 55ms to 11ms after kryo registry) • Too many logs: • bad performance -> never log the logs unnecessary
  • 15. Data Hot Point / Data Skew Q: partition = hash(key) % N A: • Choose the right hash key • mapreduce from history? • key == null? • Two-phase aggregation • Use micro-batch / local-reduce S P P S P P S P P G k1 k2 k1+salt1 k1+salt2 k1 k2micro-batch num(k1) num(P) num(S)
  • 16. RocketMQ-Streaming Integration RocketMQ-Storm: https://github.com/apache/storm/tree/master/external/storm-rocketmq • RocketMQSpout - Now only RocketMQ push mode supported, pull mode is in the plan. The default Deserializer is StringScheme, you can override the value by setting `RocketMQConfig.SCHEME`. • RocketMQBolt - Async sending by default, or you can change the value by invoking `withAsync(boolean async)` • RocketMQState - For users using Storm Trident API • TopicSelector - Selecting a topic based on the input Storm tuple • TupleToMessageMapper - Mapping a Storm tuple to a RocketMQ message, you can implement the MessageBodySerializer interface to serialize the message body. The default implementation of MessageBodySerializer is `body.toString().getBytes(StandardCharsets.UTF_8)` • MessageRetryManager - Retry policy for failed messages RocketMQ-Spark: https://github.com/apache/rocketmq-externals/tree/master/rocketmq-spark RocketMQ-Flink: Coming soon RocketMQ-Avro: Coming soon OpenMessaging-Streaming