SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
Âź
© 2016 MapR Technologies 1Ÿ
© 2016 MapR Technologies 1© 2016 MapR Technologies
Âź
Scaling and Streaming in the Extreme
Jim Scott – Director, Enterprise Strategy & Architecture
@kingmesal #bigdataeverywhere
Âź
© 2016 MapR Technologies 2Ÿ
© 2016 MapR Technologies 2
Topics
‱  Background
–  Fundamentals
‱  Zeta Architecture overview
‱  Messaging platform
–  Benefits
–  Building your applications
‱  Including microservices
‱  Story time with examples
Âź
© 2016 MapR Technologies 3Ÿ
© 2016 MapR Technologies 3© 2016 MapR Technologies© 2016 MapR Technologies
Background
Âź
© 2016 MapR Technologies 4Ÿ
© 2016 MapR Technologies 4
Data is the Problem
‱  Stop talking about “Big Data” and start talking about “Data”
–  People argue over “what constitutes big data?”
‱  Enterprise Architecture is the solution
–  Your business applications depend on data
‱  Size REALLY doesn’t matter
–  I don’t have “big data” right now
–  Stop worrying about when you qualify your data as big
–  Build your applications so you do NOT have to rearchitect when you finally
qualify your data as “big”
‱  Prepare for success
Âź
© 2016 MapR Technologies 5Ÿ
© 2016 MapR Technologies 5
All About Scaling
‱  The Goal
–  Remove data silos and enable all ANALYTICS in one place
–  Remove the pain from figuring out how to get the data moved
‱  How many servers do you need to run your business

–  More than one application server?
–  More than one web server?
–  More than one database server?
–  More than one cluster?
‱  Scalable resource management and infrastructure
Âź
© 2016 MapR Technologies 6Ÿ
© 2016 MapR Technologies 6
Proper Allocation of Resources
Âź
© 2016 MapR Technologies 7Ÿ
© 2016 MapR Technologies 7© 2016 MapR Technologies© 2016 MapR Technologies
Zeta Architecture
Âź
© 2016 MapR Technologies 8Ÿ
© 2016 MapR Technologies 8
The Next Generation Enterprise Architecture
‱  Dynamic compute resources
‱  Common storage platform
‱  Real-time application support
‱  Flexible programming models
‱  Deployment management
‱  Solution based approach
‱  Applications to operate a
business
* This is a pluggable architecture
Âź
© 2016 MapR Technologies 9Ÿ
© 2016 MapR Technologies 9
Advertising Platform on Zeta
Âź
© 2016 MapR Technologies 10Ÿ
© 2016 MapR Technologies 10
Simplified Architecture
‱  Less moving parts
–  Less things to go wrong
‱  Better resource utilization
–  Scale any application up or down on demand
‱  Common deployment model (new isolation model)
–  Repeatability between environments (dev, qa, production)
‱  Improved integration testing
–  Listen to production streams in dev and qa (** this is a BIG DEAL! **)
‱  Shared file system
–  Get at the data anywhere in the cluster
–  Simplifies business continuity
Âź
© 2016 MapR Technologies 11Ÿ
© 2016 MapR Technologies 11
Reminder

Âź
© 2016 MapR Technologies 12Ÿ
© 2016 MapR Technologies 12© 2016 MapR Technologies© 2016 MapR Technologies
Messaging platform
Âź
© 2016 MapR Technologies 13Ÿ
© 2016 MapR Technologies 13
Ability to Handle the “Extreme”
‱  1+ Trillion Events
–  per day
‱  Millions of Producers
–  Billions of events per second
‱  Multiple Consumers
–  Potentially for every event
‱  Multiple Data Centers
–  Plan for success
–  Plan for drastic failure
Think that is crazy? Consider having 100
servers and performing:
Monitoring and Application logs

–  100 metrics per server
–  60 samples per minute
–  50 metrics per request
–  1,000 log entries per request (abnormally
small, depends on level)
–  1million requests per day
~ 2 billion events per day, for one small
(ish) use case
Extreme Average Reality
Âź
© 2016 MapR Technologies 14Ÿ
© 2016 MapR Technologies 14
Which products are we discussing?
Âź
© 2016 MapR Technologies 15Ÿ
© 2016 MapR Technologies 15
Logical Dataflow
Messaging Analytics
Consumers
Stream Processors
Âź
© 2016 MapR Technologies 16Ÿ
© 2016 MapR Technologies 16
Considering a Messaging Platform
‱  50-100k messages per second used to be good
–  Not really good to handle decoupled communication between services
‱  Kafka model is BLAZING fast
–  Kafka 0.9 API with message sizes at 200 bytes
–  MapR Streams on a 5 node cluster sustained 18 million events / sec
–  Throughput of 3.5GB/s and over 1.5 trillion events / day
‱  Manual sharding is not a “great” solution
–  Adding more servers should be easy and fool proof, not painful
–  Yes, I have lived through this
Âź
© 2016 MapR Technologies 17Ÿ
© 2016 MapR Technologies 17
Easy Scale-out
‱  Stream processing engines built to consume via the Kafka API
–  Apache Flink
–  Apache Spark
–  Apache Apex (incubating)
–  Apache Storm
–  Apache Samza
–  Akka Streams - not apache ;-)
–  StreamSets (effectively a stream processing engine, but different)
‱  Build your own (Simple API)
Âź
© 2016 MapR Technologies 18Ÿ
© 2016 MapR Technologies 18
Advertising Server Use Case
‱  The redline is a message request
and response
–  Work distribution
‱  1 to 1
‱  1 to many
–  RPC Options
‱  Manual sharding
‱  Could automate, not easy
–  Decouple with a message
‱  One topic to the ad engine
‱  One topic per web server
‱  What about exception cases
–  Web server dies
–  Ad server dies
Âź
© 2016 MapR Technologies 19Ÿ
© 2016 MapR Technologies 19
Behind the Curtains
Producer
Activity Handler
Producer
Producer
Historical
Interesting
Data Real-time
Analysis
Results Dashboard
Anomaly
Detection
Âź
© 2016 MapR Technologies 20Ÿ
© 2016 MapR Technologies 20© 2016 MapR Technologies© 2016 MapR Technologies
Story time with examples
Âź
© 2016 MapR Technologies 21Ÿ
© 2016 MapR Technologies 21
Ship picks up containers

Singapore
Âź
© 2016 MapR Technologies 22Ÿ
© 2016 MapR Technologies 22
Arrives at destination

Tokyo
Âź
© 2016 MapR Technologies 23Ÿ
© 2016 MapR Technologies 23
While enroute to next destination

Washington
Âź
© 2016 MapR Technologies 24Ÿ
© 2016 MapR Technologies 24
Where does the data live

Singapore Washington
Tokyo
Âź
© 2016 MapR Technologies 25Ÿ
© 2016 MapR Technologies 25
Feels like an Analogy
‱  Data is generated on the ship
–  Must have an easy way (i.e. foolproof) to move the data off the ship
‱  Each port stores the data from the ship
–  Moving data between locations
–  Analytics could happen at any location
‱  This is a multi-data center time series data use case
–  Events from sensors = metrics
–  Same concepts as data center monitoring
Âź
© 2016 MapR Technologies 26Ÿ
© 2016 MapR Technologies 26
Sensor
Time series data
Metrics
Collector
Sensor
Sensor
Document
DB
Analytics
Âź
© 2016 MapR Technologies 27Ÿ
© 2016 MapR Technologies 27
Story Time Summary
‱  Resiliency in the metrics collector
–  Easily scalable regardless of how many sensors are added
‱  Replicate events between data centers
–  Security, business continuity, data ownership
‱  Perform analytics at the source for different use cases
–  Analytics on the event stream
–  Analytics on aggregated data in the database
–  Maybe you want your event stream to be your database

Âź
© 2016 MapR Technologies 28Ÿ
© 2016 MapR Technologies 28
“The truth
is out there.”
– Spock
Âź
© 2016 MapR Technologies 29Ÿ
© 2016 MapR Technologies 29© 2016 MapR Technologies© 2016 MapR Technologies
Wrap up
Âź
© 2016 MapR Technologies 30Ÿ
© 2016 MapR Technologies 30
Âź
© 2016 MapR Technologies 31Ÿ
© 2016 MapR Technologies 31
Q&A
@kingmesal
jscott@mapr.com
Engage with us!
kingmesal

Mais conteĂșdo relacionado

Mais procurados

SIEM Modernization: Build a Situationally Aware Organization with Apache KafkaÂź
SIEM Modernization: Build a Situationally Aware Organization with Apache KafkaÂźSIEM Modernization: Build a Situationally Aware Organization with Apache KafkaÂź
SIEM Modernization: Build a Situationally Aware Organization with Apache KafkaÂź
confluent
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
confluent
 

Mais procurados (20)

SIEM Modernization: Build a Situationally Aware Organization with Apache KafkaÂź
SIEM Modernization: Build a Situationally Aware Organization with Apache KafkaÂźSIEM Modernization: Build a Situationally Aware Organization with Apache KafkaÂź
SIEM Modernization: Build a Situationally Aware Organization with Apache KafkaÂź
 
Ingesting IoT data in Food Processing
Ingesting IoT data in Food ProcessingIngesting IoT data in Food Processing
Ingesting IoT data in Food Processing
 
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
Pulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at ScalePulsar - Real-time Analytics at Scale
Pulsar - Real-time Analytics at Scale
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Building a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platformBuilding a real-time, scalable and intelligent programmatic ad buying platform
Building a real-time, scalable and intelligent programmatic ad buying platform
 
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache KafkaScylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
 
Taming velocity - a tale of four streams
Taming velocity - a tale of four streamsTaming velocity - a tale of four streams
Taming velocity - a tale of four streams
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architecture
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
 
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Serhii Kholodniuk: What you need to know, before migrating data platform to G...Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 

Semelhante a Streaming in the Extreme

Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
 

Semelhante a Streaming in the Extreme (20)

Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions ArchitectHUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Is Spark Replacing Hadoop
Is Spark Replacing HadoopIs Spark Replacing Hadoop
Is Spark Replacing Hadoop
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Streaming in the Extreme

  • 1. Âź © 2016 MapR Technologies 1Âź © 2016 MapR Technologies 1© 2016 MapR Technologies Âź Scaling and Streaming in the Extreme Jim Scott – Director, Enterprise Strategy & Architecture @kingmesal #bigdataeverywhere
  • 2. Âź © 2016 MapR Technologies 2Âź © 2016 MapR Technologies 2 Topics ‱  Background –  Fundamentals ‱  Zeta Architecture overview ‱  Messaging platform –  Benefits –  Building your applications ‱  Including microservices ‱  Story time with examples
  • 3. Âź © 2016 MapR Technologies 3Âź © 2016 MapR Technologies 3© 2016 MapR Technologies© 2016 MapR Technologies Background
  • 4. Âź © 2016 MapR Technologies 4Âź © 2016 MapR Technologies 4 Data is the Problem ‱  Stop talking about “Big Data” and start talking about “Data” –  People argue over “what constitutes big data?” ‱  Enterprise Architecture is the solution –  Your business applications depend on data ‱  Size REALLY doesn’t matter –  I don’t have “big data” right now –  Stop worrying about when you qualify your data as big –  Build your applications so you do NOT have to rearchitect when you finally qualify your data as “big” ‱  Prepare for success
  • 5. Âź © 2016 MapR Technologies 5Âź © 2016 MapR Technologies 5 All About Scaling ‱  The Goal –  Remove data silos and enable all ANALYTICS in one place –  Remove the pain from figuring out how to get the data moved ‱  How many servers do you need to run your business
 –  More than one application server? –  More than one web server? –  More than one database server? –  More than one cluster? ‱  Scalable resource management and infrastructure
  • 6. Âź © 2016 MapR Technologies 6Âź © 2016 MapR Technologies 6 Proper Allocation of Resources
  • 7. Âź © 2016 MapR Technologies 7Âź © 2016 MapR Technologies 7© 2016 MapR Technologies© 2016 MapR Technologies Zeta Architecture
  • 8. Âź © 2016 MapR Technologies 8Âź © 2016 MapR Technologies 8 The Next Generation Enterprise Architecture ‱  Dynamic compute resources ‱  Common storage platform ‱  Real-time application support ‱  Flexible programming models ‱  Deployment management ‱  Solution based approach ‱  Applications to operate a business * This is a pluggable architecture
  • 9. Âź © 2016 MapR Technologies 9Âź © 2016 MapR Technologies 9 Advertising Platform on Zeta
  • 10. Âź © 2016 MapR Technologies 10Âź © 2016 MapR Technologies 10 Simplified Architecture ‱  Less moving parts –  Less things to go wrong ‱  Better resource utilization –  Scale any application up or down on demand ‱  Common deployment model (new isolation model) –  Repeatability between environments (dev, qa, production) ‱  Improved integration testing –  Listen to production streams in dev and qa (** this is a BIG DEAL! **) ‱  Shared file system –  Get at the data anywhere in the cluster –  Simplifies business continuity
  • 11. Âź © 2016 MapR Technologies 11Âź © 2016 MapR Technologies 11 Reminder

  • 12. Âź © 2016 MapR Technologies 12Âź © 2016 MapR Technologies 12© 2016 MapR Technologies© 2016 MapR Technologies Messaging platform
  • 13. Âź © 2016 MapR Technologies 13Âź © 2016 MapR Technologies 13 Ability to Handle the “Extreme” ‱  1+ Trillion Events –  per day ‱  Millions of Producers –  Billions of events per second ‱  Multiple Consumers –  Potentially for every event ‱  Multiple Data Centers –  Plan for success –  Plan for drastic failure Think that is crazy? Consider having 100 servers and performing: Monitoring and Application logs
 –  100 metrics per server –  60 samples per minute –  50 metrics per request –  1,000 log entries per request (abnormally small, depends on level) –  1million requests per day ~ 2 billion events per day, for one small (ish) use case Extreme Average Reality
  • 14. Âź © 2016 MapR Technologies 14Âź © 2016 MapR Technologies 14 Which products are we discussing?
  • 15. Âź © 2016 MapR Technologies 15Âź © 2016 MapR Technologies 15 Logical Dataflow Messaging Analytics Consumers Stream Processors
  • 16. Âź © 2016 MapR Technologies 16Âź © 2016 MapR Technologies 16 Considering a Messaging Platform ‱  50-100k messages per second used to be good –  Not really good to handle decoupled communication between services ‱  Kafka model is BLAZING fast –  Kafka 0.9 API with message sizes at 200 bytes –  MapR Streams on a 5 node cluster sustained 18 million events / sec –  Throughput of 3.5GB/s and over 1.5 trillion events / day ‱  Manual sharding is not a “great” solution –  Adding more servers should be easy and fool proof, not painful –  Yes, I have lived through this
  • 17. Âź © 2016 MapR Technologies 17Âź © 2016 MapR Technologies 17 Easy Scale-out ‱  Stream processing engines built to consume via the Kafka API –  Apache Flink –  Apache Spark –  Apache Apex (incubating) –  Apache Storm –  Apache Samza –  Akka Streams - not apache ;-) –  StreamSets (effectively a stream processing engine, but different) ‱  Build your own (Simple API)
  • 18. Âź © 2016 MapR Technologies 18Âź © 2016 MapR Technologies 18 Advertising Server Use Case ‱  The redline is a message request and response –  Work distribution ‱  1 to 1 ‱  1 to many –  RPC Options ‱  Manual sharding ‱  Could automate, not easy –  Decouple with a message ‱  One topic to the ad engine ‱  One topic per web server ‱  What about exception cases –  Web server dies –  Ad server dies
  • 19. Âź © 2016 MapR Technologies 19Âź © 2016 MapR Technologies 19 Behind the Curtains Producer Activity Handler Producer Producer Historical Interesting Data Real-time Analysis Results Dashboard Anomaly Detection
  • 20. Âź © 2016 MapR Technologies 20Âź © 2016 MapR Technologies 20© 2016 MapR Technologies© 2016 MapR Technologies Story time with examples
  • 21. Âź © 2016 MapR Technologies 21Âź © 2016 MapR Technologies 21 Ship picks up containers
 Singapore
  • 22. Âź © 2016 MapR Technologies 22Âź © 2016 MapR Technologies 22 Arrives at destination
 Tokyo
  • 23. Âź © 2016 MapR Technologies 23Âź © 2016 MapR Technologies 23 While enroute to next destination
 Washington
  • 24. Âź © 2016 MapR Technologies 24Âź © 2016 MapR Technologies 24 Where does the data live
 Singapore Washington Tokyo
  • 25. Âź © 2016 MapR Technologies 25Âź © 2016 MapR Technologies 25 Feels like an Analogy ‱  Data is generated on the ship –  Must have an easy way (i.e. foolproof) to move the data off the ship ‱  Each port stores the data from the ship –  Moving data between locations –  Analytics could happen at any location ‱  This is a multi-data center time series data use case –  Events from sensors = metrics –  Same concepts as data center monitoring
  • 26. Âź © 2016 MapR Technologies 26Âź © 2016 MapR Technologies 26 Sensor Time series data Metrics Collector Sensor Sensor Document DB Analytics
  • 27. Âź © 2016 MapR Technologies 27Âź © 2016 MapR Technologies 27 Story Time Summary ‱  Resiliency in the metrics collector –  Easily scalable regardless of how many sensors are added ‱  Replicate events between data centers –  Security, business continuity, data ownership ‱  Perform analytics at the source for different use cases –  Analytics on the event stream –  Analytics on aggregated data in the database –  Maybe you want your event stream to be your database

  • 28. Âź © 2016 MapR Technologies 28Âź © 2016 MapR Technologies 28 “The truth is out there.” – Spock
  • 29. Âź © 2016 MapR Technologies 29Âź © 2016 MapR Technologies 29© 2016 MapR Technologies© 2016 MapR Technologies Wrap up
  • 30. Âź © 2016 MapR Technologies 30Âź © 2016 MapR Technologies 30
  • 31. Âź © 2016 MapR Technologies 31Âź © 2016 MapR Technologies 31 Q&A @kingmesal jscott@mapr.com Engage with us! kingmesal