Mais conteúdo relacionado
Semelhante a Streaming Goes Mainstream: New Architecture & Emerging Technologies for Stream Transport and Processing (20)
Mais de MapR Technologies (11)
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Stream Transport and Processing
- 1. ®
© 2016 MapR Technologies 1®
© 2016 MapR Technologies 1© 2016 MapR Technologies
®
Streaming Goes Mainstream:
Ellen Friedman
12 October 2016 Women in Big Data Meetup #datawomen
Transport, Processing & Architecture
- 2. ®
© 2016 MapR Technologies 2®
© 2016 MapR Technologies 2
Contact Information
Ellen Friedman
Solutions Consultant, MapR Technologies
Committer Apache Drill & Apache Mahout projects
Author, O’Reilly short books
Email ellenf@apache.org efriedman@maprtech.com
Twitter @Ellen_Friedman #datawomen
- 3. ®
© 2016 MapR Technologies 3®
© 2016 MapR Technologies 3
Please support women in tech – help build
girls’ dreams of what they can accomplish
© Ellen Friedman 2015
- 4. ®
© 2016 MapR Technologies 4®
© 2016 MapR Technologies 4
The
entire
industry
is
undergoing
a
career
change
- 5. ®
© 2016 MapR Technologies 5®
© 2016 MapR Technologies 5
Big Data has caught on
• Potential value of big data approaches is widely recognized
• Technologies for distributed storage at low cost are maturing
• People are looking for operational and analytical solutions in
order to take advantage of large scale data opportunities…
• Now there’s a new form of revolution based on streaming data
- 6. ®
© 2016 MapR Technologies 6®
© 2016 MapR Technologies 6
Why stream?
- 7. ®
© 2016 MapR Technologies 7®
© 2016 MapR Technologies 7
“Our best understanding comes when
our conclusions fit the evidence.
And that is most effectively done
when our analyses fit the way life
happens.”
- Introduction to Apache Flink
Friedman & Tzoumas (O’Reilly Sept 2016)
- 8. ®
© 2016 MapR Technologies 8®
© 2016 MapR Technologies 8
Life doesn’t happen in batches…
- 9. ®
© 2016 MapR Technologies 9®
© 2016 MapR Technologies 9
Images © Friedman & Dunning from O’Reilly book A New Look at Anomaly Detection, used with permission
Time Series Data & the IoT
Sensors in airplanes
not only send data to
the ERD (black box)
They also report back
to manufacturers of
“smart parts” such as
turbines found in jet
engines or wind
farms.
- 10. ®
© 2016 MapR Technologies 10®
© 2016 MapR Technologies 10
Big data project: Maury’s Wind and Currents charts
- Value from big data in
aggregate
- Crowd sourced
- But static: not real time
insights
- 11. ®
© 2016 MapR Technologies 11®
© 2016 MapR Technologies 11
Modern big data navigation: WAZE
• Uses real-time streaming traffic & road
information shared by 65 million drivers/ month
• Intended to save fuel and time during commute
• Partnered with Esri GSI software to help put
data insights to work for cities, states
11 Oct 2016 article in Tech Crunch
http://bit.ly/tech-crunch-waze-esri
• Time-value of data often is important
“Outsmarting traffic, together”
-WAZE website https://www.waze.com/
- 12. ®
© 2016 MapR Technologies 12®
© 2016 MapR Technologies 12
Crowd-sourced Traffic
Streaming sensor data + long term maintenance histories !
• Machine learning model detects anomalous pattern
• Signals need for maintenance before damage occurs
Image courtesy Mtell; from Real World Hadoop by
Dunning & Friedman ( © 2015) Chap 6
- 13. ®
© 2016 MapR Technologies 13®
© 2016 MapR Technologies 13
Streaming
is
mainstream
- 14. ®
© 2016 MapR Technologies 14®
© 2016 MapR Technologies 14
Web-based Business
A: Real-time insights from
low latency applications
(update a real-time
dashboard)
B: Current status updated in
databases or search
documents (Customer 360)
C: Durable messages for
auditable history (Security
analytics)
Real-time
dashboards
data
Archived Customer 360
database
Security
analytics
A
B
C
Messages
Logs
- 15. ®
© 2016 MapR Technologies 15®
© 2016 MapR Technologies 15
Web-based Business
A: Real-time insights from
low latency applications
(update a real-time
dashboard)
B: Current status updated in
databases or search
documents (Customer 360)
C: Durable messages for
auditable history (Security
analytics)
Real-time
dashboards
data
Archived Customer 360
database
Security
analytics
A
B
C
Messages
Logs
- 16. ®
© 2016 MapR Technologies 16®
© 2016 MapR Technologies 16
Streaming data has value beyond
real-time insights
- 17. ®
© 2016 MapR Technologies 17®
© 2016 MapR Technologies 17
Web-based Business
A: Real-time insights from
low latency applications
(update a real-time
dashboard)
B: Current status updated in
databases or search
documents (Customer 360)
C: Durable messages for
auditable history (Security
analytics)
Real-time
dashboards
data
Archived Customer 360
database
Security
analytics
A
B
C
Messages
Logs
- 18. ®
© 2016 MapR Technologies 18®
© 2016 MapR Technologies 18
At the heart of an effective
streaming architecture is the
right choice of stream
transport.
- 19. ®
© 2016 MapR Technologies 19®
© 2016 MapR Technologies 19
Message Stream Transport
Apache Kafka
or
MapR Streams
Others
- 20. ®
© 2016 MapR Technologies 20®
© 2016 MapR Technologies 20
Key capabilities
Message Transport Technology: Kafka & MapR Streams
● Highly scalable
● High throughput, low
latency
● Decouple multiple
producers & consumers
● Durable messages with
configurable time to live
● Geo-distributed replication
(MapR Streams)
Consumer
group
Messages
Producer
Consumer
group
Consumer
group
Producer
- 21. ®
© 2016 MapR Technologies 21®
© 2016 MapR Technologies 21
Alert: Pre-conceptions can make you miss new ideas
• It’s hard to order a coffee if you
want mostly milk
• Example: MapR Streams is part
of the converged data platform
so does not require a separate
cluster for message transport
(as you would with Kafka)
• Example: Message streams can
support microservices
“Getting Past Pre-conceptions”
http://bit.ly/mapr-blog-ef-17-08
- 22. ®
© 2016 MapR Technologies 22®
© 2016 MapR Technologies 22
MapR Streams: Topics, Partitions
• Data is assigned to topics (as in Kafka)
• Topic can be partitioned for load balancing/ performance (as in Kafka)
• Topic partition is distributed across the MapR cluster (not restricted to
one node as in Kafka)
– Makes long-term auditable history practical
Producer
2
Producer
1 Topic 1
Consumer 2
Consumer 1
Consumer 3
Consumer group
- 23. ®
© 2016 MapR Technologies 23®
© 2016 MapR Technologies 23
Stream-first Architecture: Basis for MicroServices
Stream as the shared “truth” instead of a database
Database as local truth
POS
1..n
Fraud
detector
Last card
use
Updater
Card
analytics
Other
card activity
- 24. ®
© 2016 MapR Technologies 24®
© 2016 MapR Technologies 24
MapR Streams: Part of MapR Converged Data Platform
Open Source Engines &
Tools
Commercial Engines & Applications
Utility-Grade Platform Services
Dat
a
Processing
Enterprise Storage
MapR-FS MapR-DB MapR Streams
Database Event Streaming
Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy
Search &
Others
Cloud &
Managed
Services
Custom
Apps
UnifiedManagementand
Monitoring
MapR Converged Data Platform has distributed files, NoSQL DB &
message streams engineered into one technology
- 25. ®
© 2016 MapR Technologies 25®
© 2016 MapR Technologies 25
Unique to MapR: Manage topics at Stream level
• Topics are grouped together in Stream (different from Kafka)
• Policies are set at the Stream level such as time-to-live, ACEs
(controlled access at this level is different than Kafka)
• Geo-distributed replication at Stream level (different from Kafka)
Stream
Topic 1
Topic 3
Topic 2
- 26. ®
© 2016 MapR Technologies 26®
© 2016 MapR Technologies 26
MapR Streams:
Geo-distributed replication of
message stream across data centers
- 27. ®
© 2016 MapR Technologies 27®
© 2016 MapR Technologies 27
Multiple Stakeholders: Container Shipping
Image © Ellen Friedman 2015
Over 20% of world’s
shipping containers pass
through Singapore’s port.
- 28. ®
© 2016 MapR Technologies 28®
© 2016 MapR Technologies 28
MapR Streams replication across data centers
A: Sensors stream data to on-
board cluster that reports to
onshore cluster while in port
B: MapR Streams geo-replication
sends data to next port before
ship arrives.
C: Real-time insights alert to “high
humidity” in some containers
Singapore
Tokyo
Sydney
Corporate
HQ
A
B
C
Find details on this use case in Chap 7 of book “Streaming Architecture”
Read online here: http://bit.ly/streams-ebook-ch7
- 29. ®
© 2016 MapR Technologies 29®
© 2016 MapR Technologies 29
MapR Streams: Replication Across Data Centers
What’s the value?
– Replication across data centers
with preserved offsets (unlike
Kafka)
– Opens new use cases:
– Example: Shared inventory, as with
ad-tech use case
Inventory
model
Global
analytics
Database
Local
state
Inventory
model
Local
state
Data center 1 Data center 2
Central data center
- 30. ®
© 2016 MapR Technologies 30®
© 2016 MapR Technologies 30
What about stream processing?
- 31. ®
© 2016 MapR Technologies 31®
© 2016 MapR Technologies 31
Several good choices for stream processing
• You choose the tool you like for processing streaming data
– MapR ships & supports the full Apache Spark stack including Spark
Streaming
– Apache Flink has been benchmarked on MapR with extremely good
performance on MapR Streams transport; Flink not yet supported by
MapR
– Other good options include Apache Apex (think Data Torrent) & Apache
Storm
- 32. ®
© 2016 MapR Technologies 32®
© 2016 MapR Technologies 32
Overview: Apache Flink Stream Processing
Figure 2-1 from “Introduction to Apache Flink” book, used with permission.
Download free pdf here: http://bit.ly/mapr-intro-flink-book-pdf
Kafka /
MapR Streams
Database
File
Flink
Transport Processing
- 33. ®
© 2016 MapR Technologies 33®
© 2016 MapR Technologies 33
Overview: Apache Flink
• Top level Apache project with big international OSS community
• True stream processing
– Advantage if SLAs require extremely low latency (real-time)
– Good fit to continuous events
• Also works well for batch processing
• Being used in production (telecom; games)
- 34. ®
© 2016 MapR Technologies 34®
© 2016 MapR Technologies 34
Flink is BIG in Europe ;-)
- 35. ®
© 2016 MapR Technologies 35®
© 2016 MapR Technologies 35
Stream Processing: Compare Choices
“Real-time” event-by-event
processing
• Apache Flink
• Apache Apex
• Apache Storm
Not “real-time” processing:
micro-batching
• Apache Spark Streaming
But latency is just one issue to consider in choosing a stream
processing technology…
- 36. ®
© 2016 MapR Technologies 36®
© 2016 MapR Technologies 36
Capabilities for Stream Processing Options
Correct
under
stress
Correct
time / window
semanticsEase of use /
expressiveness
Flink
Streaming
High
throughput
Spark Storm
Low
latency
Figure 1-2 from “Introduction to Apache Flink” book, used with permission.
Download free pdf here: http://bit.ly/mapr-intro-flink-book-pdf
- 37. ®
© 2016 MapR Technologies 37®
© 2016 MapR Technologies 37
Overview: Apache Flink Windowing
A
B
C
Before:
Windows defined by micro-batches
(not Flink)
A
B
C
Gap
Now:
Windows defined gap between activity
(this is Flink)
Figures 3-1 and 3-2 from “Introduction to Apache Flink” book, used with permission.
Download free pdf here: http://bit.ly/mapr-intro-flink-book-pdf
- 38. ®
© 2016 MapR Technologies 38®
© 2016 MapR Technologies 38
Overview: Apache Flink Event Time
Figure 3-3 from “Introduction to Apache Flink” book,
used with permission.
Processing time Event time
Computation can be based on when
data is processed
OR
When event occurred
In many situations, processing by event
time provides more accurate results.
- 39. ®
© 2016 MapR Technologies 39®
© 2016 MapR Technologies 39
Overview: Apache Flink Event Time
Stephan Ewen, Apache Flink PMC Committer, explaining event time
processing option for Flink in a Whiteboard Walkthrough video:
http://bit.ly/mapr-whiteboard-walkthrough-flink-event-time
When you analyze data by
event time, you must take
into account that events
may arrive delayed or out of
order.
This is important for use
cases in which you want to
correlate events.
- 40. ®
© 2016 MapR Technologies 40®
© 2016 MapR Technologies 40
Apache Flink: Useful Characteristics
• Stateful processing & accuracy under stress: Checkpoints
• Windowing options are a good fit to the way natural sessions occur
• Event time option for accurate computation
– See Whiteboard Walkthrough video by Stephan Ewen (PMC member Apache
Flink) on event time
http://bit.ly/mapr-whiteboard-walkthrough-flink-event-time
• Savepoints let you reprocess data (bug fixes, updates, etc)
– See Whiteboard Walkthrough video by Stephan Ewen on Flink savepoints
http://bit.ly/whiteboard-walkthrough-flink-1
- 41. ®
© 2016 MapR Technologies 41®
© 2016 MapR Technologies 41
Streaming Resources from MapR (thank you)
Free resource from MapR: book on Apache Spark
Download free pdf
courtesy of MapR Technologies
http://bit.ly/mapr-apache-spark-
book-pdf
Or read online:
http://bit.ly/mapr-apache-spark-
ebook
- 42. ®
© 2016 MapR Technologies 42®
© 2016 MapR Technologies 42
Streaming Resources from MapR (thank you)
Free resource from MapR: book on stream-1st architecture & message
transport
Download free pdf
courtesy of MapR Technologies
http://bit.ly/mapr-streams-ebook
Or read online:
http://bit.ly/mapr-streaming-data-
ebook
- 43. ®
© 2016 MapR Technologies 43®
© 2016 MapR Technologies 43
Streaming Resources from MapR (thank you)
Free resource from MapR: book on Apache Flink stream processing
Download free pdf
courtesy of MapR Technologies
http://bit.ly/mapr-intro-flink-book-pdf
Or read online: <coming soon>
Ellen Friedman
& Kostas Tzoumas
Introduction
toApacheFlink
Stream Processing for
Real Time and Beyond
New ebook by
Ellen Friedman and
Kostas Tzoumas
In this book you’ll learn:
· What Apache Flink can do
· How it maintains consistency and provides flexibility
· How people are using it, including in production
· Best practices for streaming architectures
Download your copy:
mapr.com/flink-book
- 44. ®
© 2016 MapR Technologies 44®
© 2016 MapR Technologies 44
Short Books by Ted Dunning & Ellen Friedman
For sale from Amazon or O’Reilly
Free pdf download courtesy of MapR www.mapr.com/ebook
http://bit.ly/ebook-
real-world-hadoop
http://bit.ly/mapr-
tsdb-ebook
http://bit.ly/
ebook-anomaly
http://bit.ly/
recommendation
-ebook
http://bit.ly/mapr-
ebook-sharing-data
- 45. ®
© 2016 MapR Technologies 45®
© 2016 MapR Technologies 45
Please support women in tech – help build
girls’ dreams of what they can accomplish
© Ellen Friedman 2015
- 46. ®
© 2016 MapR Technologies 46®
© 2016 MapR Technologies 46
Thank you !