Mais conteĂșdo relacionado Semelhante a Streaming in the Extreme (20) Streaming in the Extreme1. Âź
© 2016 MapR Technologies 1Ÿ
© 2016 MapR Technologies 1© 2016 MapR Technologies
Âź
Scaling and Streaming in the Extreme
Jim Scott â Director, Enterprise Strategy & Architecture
@kingmesal #bigdataeverywhere
2. Âź
© 2016 MapR Technologies 2Ÿ
© 2016 MapR Technologies 2
Topics
âąâŻ Background
â⯠Fundamentals
âąâŻ Zeta Architecture overview
âąâŻ Messaging platform
â⯠Benefits
â⯠Building your applications
âąâŻ Including microservices
âąâŻ Story time with examples
3. Âź
© 2016 MapR Technologies 3Ÿ
© 2016 MapR Technologies 3© 2016 MapR Technologies© 2016 MapR Technologies
Background
4. Âź
© 2016 MapR Technologies 4Ÿ
© 2016 MapR Technologies 4
Data is the Problem
âąâŻ Stop talking about âBig Dataâ and start talking about âDataâ
â⯠People argue over âwhat constitutes big data?â
âąâŻ Enterprise Architecture is the solution
â⯠Your business applications depend on data
âąâŻ Size REALLY doesnât matter
â⯠I donât have âbig dataâ right now
â⯠Stop worrying about when you qualify your data as big
â⯠Build your applications so you do NOT have to rearchitect when you finally
qualify your data as âbigâ
âąâŻ Prepare for success
5. Âź
© 2016 MapR Technologies 5Ÿ
© 2016 MapR Technologies 5
All About Scaling
âąâŻ The Goal
â⯠Remove data silos and enable all ANALYTICS in one place
â⯠Remove the pain from figuring out how to get the data moved
âąâŻ How many servers do you need to run your businessâŠ
â⯠More than one application server?
â⯠More than one web server?
â⯠More than one database server?
â⯠More than one cluster?
âąâŻ Scalable resource management and infrastructure
6. Âź
© 2016 MapR Technologies 6Ÿ
© 2016 MapR Technologies 6
Proper Allocation of Resources
7. Âź
© 2016 MapR Technologies 7Ÿ
© 2016 MapR Technologies 7© 2016 MapR Technologies© 2016 MapR Technologies
Zeta Architecture
8. Âź
© 2016 MapR Technologies 8Ÿ
© 2016 MapR Technologies 8
The Next Generation Enterprise Architecture
âąâŻ Dynamic compute resources
âąâŻ Common storage platform
âąâŻ Real-time application support
âąâŻ Flexible programming models
âąâŻ Deployment management
âąâŻ Solution based approach
âąâŻ Applications to operate a
business
* This is a pluggable architecture
9. Âź
© 2016 MapR Technologies 9Ÿ
© 2016 MapR Technologies 9
Advertising Platform on Zeta
10. Âź
© 2016 MapR Technologies 10Ÿ
© 2016 MapR Technologies 10
Simplified Architecture
âąâŻ Less moving parts
â⯠Less things to go wrong
âąâŻ Better resource utilization
â⯠Scale any application up or down on demand
âąâŻ Common deployment model (new isolation model)
â⯠Repeatability between environments (dev, qa, production)
âąâŻ Improved integration testing
â⯠Listen to production streams in dev and qa (** this is a BIG DEAL! **)
âąâŻ Shared file system
â⯠Get at the data anywhere in the cluster
â⯠Simplifies business continuity
11. Âź
© 2016 MapR Technologies 11Ÿ
© 2016 MapR Technologies 11
ReminderâŠ
12. Âź
© 2016 MapR Technologies 12Ÿ
© 2016 MapR Technologies 12© 2016 MapR Technologies© 2016 MapR Technologies
Messaging platform
13. Âź
© 2016 MapR Technologies 13Ÿ
© 2016 MapR Technologies 13
Ability to Handle the âExtremeâ
âąâŻ 1+ Trillion Events
â⯠per day
âąâŻ Millions of Producers
â⯠Billions of events per second
âąâŻ Multiple Consumers
â⯠Potentially for every event
âąâŻ Multiple Data Centers
â⯠Plan for success
â⯠Plan for drastic failure
Think that is crazy? Consider having 100
servers and performing:
Monitoring and Application logsâŠ
â⯠100 metrics per server
â⯠60 samples per minute
â⯠50 metrics per request
â⯠1,000 log entries per request (abnormally
small, depends on level)
â⯠1million requests per day
~ 2 billion events per day, for one small
(ish) use case
Extreme Average Reality
14. Âź
© 2016 MapR Technologies 14Ÿ
© 2016 MapR Technologies 14
Which products are we discussing?
15. Âź
© 2016 MapR Technologies 15Ÿ
© 2016 MapR Technologies 15
Logical Dataflow
Messaging Analytics
Consumers
Stream Processors
16. Âź
© 2016 MapR Technologies 16Ÿ
© 2016 MapR Technologies 16
Considering a Messaging Platform
âąâŻ 50-100k messages per second used to be good
â⯠Not really good to handle decoupled communication between services
âąâŻ Kafka model is BLAZING fast
â⯠Kafka 0.9 API with message sizes at 200 bytes
â⯠MapR Streams on a 5 node cluster sustained 18 million events / sec
â⯠Throughput of 3.5GB/s and over 1.5 trillion events / day
âąâŻ Manual sharding is not a âgreatâ solution
â⯠Adding more servers should be easy and fool proof, not painful
â⯠Yes, I have lived through this
17. Âź
© 2016 MapR Technologies 17Ÿ
© 2016 MapR Technologies 17
Easy Scale-out
âąâŻ Stream processing engines built to consume via the Kafka API
â⯠Apache Flink
â⯠Apache Spark
â⯠Apache Apex (incubating)
â⯠Apache Storm
â⯠Apache Samza
â⯠Akka Streams - not apache ;-)
â⯠StreamSets (effectively a stream processing engine, but different)
âąâŻ Build your own (Simple API)
18. Âź
© 2016 MapR Technologies 18Ÿ
© 2016 MapR Technologies 18
Advertising Server Use Case
âąâŻ The redline is a message request
and response
â⯠Work distribution
âąâŻ 1 to 1
âąâŻ 1 to many
â⯠RPC Options
âąâŻ Manual sharding
âąâŻ Could automate, not easy
â⯠Decouple with a message
âąâŻ One topic to the ad engine
âąâŻ One topic per web server
âąâŻ What about exception cases
â⯠Web server dies
â⯠Ad server dies
19. Âź
© 2016 MapR Technologies 19Ÿ
© 2016 MapR Technologies 19
Behind the Curtains
Producer
Activity Handler
Producer
Producer
Historical
Interesting
Data Real-time
Analysis
Results Dashboard
Anomaly
Detection
20. Âź
© 2016 MapR Technologies 20Ÿ
© 2016 MapR Technologies 20© 2016 MapR Technologies© 2016 MapR Technologies
Story time with examples
21. Âź
© 2016 MapR Technologies 21Ÿ
© 2016 MapR Technologies 21
Ship picks up containersâŠ
Singapore
22. Âź
© 2016 MapR Technologies 22Ÿ
© 2016 MapR Technologies 22
Arrives at destinationâŠ
Tokyo
23. Âź
© 2016 MapR Technologies 23Ÿ
© 2016 MapR Technologies 23
While enroute to next destinationâŠ
Washington
24. Âź
© 2016 MapR Technologies 24Ÿ
© 2016 MapR Technologies 24
Where does the data liveâŠ
Singapore Washington
Tokyo
25. Âź
© 2016 MapR Technologies 25Ÿ
© 2016 MapR Technologies 25
Feels like an Analogy
âąâŻ Data is generated on the ship
â⯠Must have an easy way (i.e. foolproof) to move the data off the ship
âąâŻ Each port stores the data from the ship
â⯠Moving data between locations
â⯠Analytics could happen at any location
âąâŻ This is a multi-data center time series data use case
â⯠Events from sensors = metrics
â⯠Same concepts as data center monitoring
26. Âź
© 2016 MapR Technologies 26Ÿ
© 2016 MapR Technologies 26
Sensor
Time series data
Metrics
Collector
Sensor
Sensor
Document
DB
Analytics
27. Âź
© 2016 MapR Technologies 27Ÿ
© 2016 MapR Technologies 27
Story Time Summary
âąâŻ Resiliency in the metrics collector
â⯠Easily scalable regardless of how many sensors are added
âąâŻ Replicate events between data centers
â⯠Security, business continuity, data ownership
âąâŻ Perform analytics at the source for different use cases
â⯠Analytics on the event stream
â⯠Analytics on aggregated data in the database
â⯠Maybe you want your event stream to be your databaseâŠ
28. Âź
© 2016 MapR Technologies 28Ÿ
© 2016 MapR Technologies 28
âThe truth
is out there.â
â Spock
29. Âź
© 2016 MapR Technologies 29Ÿ
© 2016 MapR Technologies 29© 2016 MapR Technologies© 2016 MapR Technologies
Wrap up
31. Âź
© 2016 MapR Technologies 31Ÿ
© 2016 MapR Technologies 31
Q&A
@kingmesal
jscott@mapr.com
Engage with us!
kingmesal