2. Welcome
• High Velocity Big Data
• What is Complex Event Processing?
• Analyzing Time Series with SAX
• What is Map/Reduce?
• Correlating with Historical Data
• Using the Cloud
• Questions
CLOUD
EVENT
PROCESSING
3. Data Growth*
18
16
14
12
10
8
6
4
2
0
Category 1 Category 2 Category 3 Category 4
CLOUD *It would appear that things will actually get worse, not better
EVENT
PROCESSING
4. High Velocity Big Data
• What is Big Data?
– You’ve got Big Data issues when you can’t turn the
data into information fast enough to act on:
• Earthquake
• Brownout
• Market Crash
• Terrorist Event
– You’ve got Big Data when you have to consider its
actually Physicality
• What is High Velocity Big Data
– Big Data In Flight…
• You don’t get to store it before you analyze it
CLOUD
EVENT
PROCESSING
5. What is Complex Event Processing?
• Complex Event Processing (CEP) delivers high-
speed processing of many events across all the
layers of an organization, identifying only the
most meaningful events within the event
cloud, analyzing their impact, and taking
subsequent action in real time.
– From Wikipedia
CLOUD
EVENT
PROCESSING
6. What? What is CEP?
• Domain Specific Language
– Makes it easier to deal with events
• Continuous Query
– Select symbol, side, price from tradeStream
• Time/Length Windows
– Select symbol, side, avg(price) from
tradeStream.win:time(10 minutes) group by symbol, side
• Pattern Matching
– select a.* from pattern [every a=FIXNewOrderSingle ->
(timer:interval(30 seconds) and not
FIXNewOrderSingle(a.Side!=Side and a.OrderQty =
OrderQty and a.Symbol = Symbol))]
CLOUD
EVENT
PROCESSING
7. Wouldn’t It Be Cool
• Select * from everything where itsInteresting
= toMe in last 10 minutes;
• Select * from everything where earthQuake >
.8;
• Select * from everything where
terroristsWillStrike > .9;
CLOUD
EVENT
PROCESSING
8. CEP – Current Benefits*
• Really Fast!
• Low Latency!
• Provides a ‘ready made’ framework to build
real-time pattern matching applications
• Think at a higher level
– Productivity
*your mileage may vary, widely
CLOUD
EVENT
PROCESSING
9. CEP – Current Limitations
• Memory Bound
– If you have a lot of events and windows, you risk
running out of memory on a single machine
• Compute Bound
– To ensure high throughput and low latency, most
CEP engines are actually doing simplistic things
• e.g. Filtering events
• Black Box
– What’s going on in there?
CLOUD
EVENT
PROCESSING
10. Checkpoint
• Ok, so by using Complex Event Processing
– You can analyze data in flight
– But
• You’re constrained by:
– Available compute
– Memory
• Because, there’s still too much data to process
on one machine…
CLOUD
EVENT
PROCESSING
11. The Problem With Time Series
• Dimensionality
– How can I recognize something?
• Distance Measures
– How do I find similar occurrences?
• Time
– By the time I process the data, the information
has little value…
CLOUD
EVENT
PROCESSING
12. Symbolic Aggregate Approximation
SAX Encoding
• SAX reduces numerical data to a
short string, or SAX word. c
c c
• Thousands of data points of b b
numerical, continuous data b
becomes ‘ABCEDEFGH’
- a a
0 20 40 60 80 100 120
• SAX Approximation of the data fits
in main memory, yet retains
features of interest
baabccbc
• Creating SAX words from SAX Advantages:
historical and streaming data • Patterns identified and described using SAX actually
look like the underlying data
allows us to perform all kinds of
magic… • Other algorithms sometimes don’t actually describe
CLOUD the underlying patterns or take way too much work to
EVENT be useful in real time
PROCESSING
13. SAX – 5 Use Cases
• Indexing
– Given a time series, find similar time series in the database
• Clustering
– Find natural grouping in the time series
• Classification
– Automagically sort patterns found in time series into
categories
• Summarization
– Condense verbose data into meaningful information
• Anomaly Detection
– Find surprising, interesting, or unexpected behavior
CLOUD
EVENT
PROCESSING
14. Why SAX is Cool
• Lower Bounding
– The patterns identified and described using SAX
actually look like the underlying data
• Dimensionality Reduction
– Previously intractable problems become possible in
real time
• Other algorithms sometimes don’t describe
underlying patterns
• Take way too much work to be useful in real time
CLOUD
EVENT
PROCESSING
17. And Finally, SAX
G
F
E
E
D D
C
C C C
B
B
A
CLOUD
EVENT
PROCESSING EDDCCBC
18. Checkpoint
• We’ve reduced dimensionality
• We know were we are
– The current pattern is AABASDGF
• We’re calculating it in ‘real-time’*
– Using Complex Event Processing
• But
– There’s still too much data to process on one
machine…
• How can we process more data in the same
amount of time?
CLOUD
EVENT
PROCESSING
*I much prefer the term event-driven
19. What is Map/Reduce?
• Framework for processing ginormous datasets using a large number
of computers (nodes) in a cluster.
• "Map"
Master node takes the input, chops it up into smaller sub-
problems, and distributes those to worker nodes. The worker node
processes that smaller problem, and passes the answer back to its
master node.
• "Reduce"
Takes the answers to all the sub-problems and combines them in a
way to get the output - the answer to the problem it was originally
trying to solve.
– From Wikipedia
CLOUD
EVENT
PROCESSING
20. What? What is Map/Reduce?
• WordCount Example (classic)
– Map scans text for words and emits - {word,1}
– Combine/collapses key values on same node -
{word,1,1,1} -> {word,3}
– Shuffle/Sort merges results from different nodes
• {node A,”NoSQL”,50} {node B,:”Oracle”,50} {node B,”NoSQL”, 50)
– becomes
• {node A,”NoSQL”,50} {node B,”NoSQL”,50} {node B,”Oracle”,50}
– Reduce
• Outputs {“NoSQL”,100} {“Oracle”,50}
CLOUD
EVENT
PROCESSING
21. SAX and Map/Reduce
• SAX is an ‘embarrassingly parallel’ problem
• Using parallel processing allows SAX words to
be computed more quickly
• Using Streaming Map/Reduce provides results
even faster, increasing the value of data even
more
– Partition by symbol and sort by timestamp
– Calculate SAX words for each symbol, in parallel
• CEP Time Windows to the Rescue!
CLOUD
EVENT
PROCESSING
22. Checkpoint
• CEP is great, but I still have to tell it what I’m
looking for, right?
• SAX can help us reduce dimensionality, what
else can it do for us?
• How do I relate Streaming Data to Historical
Data?
• How do I do this while the Information still has
value?
CLOUD
EVENT
PROCESSING
23. High Velocity Big Data Pattern
Historical
Map
Events Map Events Reduce
Map
Map
Events OnRamp Events Map SAX Reduce Context
Map
CLOUD
EVENT
PROCESSING
24. So What Do We Need?
• Complex Event Processing
• The Algorithm (SAX)
• Processing Model – Streaming Map/Reduce
• Context – The Historical Aspect
• What Do We Call This?
CLOUD
EVENT
PROCESSING
25. What is DarkStar?
– Platform as a Service (PaaS)
• Provides Distributed
– Complex Event Processing
– Streaming Map/Reduce
– Messaging
– Web Services
– Monitoring/Management
– Applications are built on top, or inside
• SAX runs inside of DarkStar
– SAX is not a component of DarkStar, but an add-in library
– And deployed in a cluster
• Virtualized Resources
CLOUD
EVENT
PROCESSING
26. DarkStar
• What patterns are occurring in my data, right
now?
– CEP based streaming Map/Reduce
• Use a cluster of machines
• When did this pattern happen before?
– Database with embedded Map/Reduce
• No need to move data outside the database for
processing
CLOUD
EVENT
PROCESSING
27. The Cloud
• Elastic Resource
– Grows/Shrinks according to demand
• Virtualization
– Efficient utilization of compute
• The Previously Unthinkable
– Is now possible, if not already commonplace
• Peering can provide access to Big Pipes and
Secure Data
CLOUD
EVENT
PROCESSING
28. Thank You!
• Questions?
• Contact Me
– Colin Clark
– @EventCloudPro
– cpclark@cloudeventprocessing.com
CLOUD
EVENT
PROCESSING