Information systems in general, and business processes in particular, generate a wealth of information in the form of event traces or logs. The analysis of these logs, either offline or in real-time, can be put to numerous uses: computation of various statistics, detection of anomalous patterns or compliance violations of some form of contract. However, current solutions for Complex Event Processing (CEP) generally offer only a restricted set of predefined queries on traces, and otherwise require a user to write procedural code to compute custom queries. In this presentation, we present a formal and declarative language for the manipulation of event traces.
3. Events
An event is an element e taken from some
set E, called the event type
B
Booleans
3
4π
R
2
Numbers
abc
S
Strings
X→ Y
Functions
2X
Sets
Primitive
types
Composite
types
4. A sample log
[10:24:31] INFO Game starts
[10:24:33] WARN Lemming into Blocker...[
[10:25:01] DEBG Lemming into Floater, id: 32,
x: 320, y: 67 ; id: 31, x: 450, y: 43 ;
id: 23, x: 229, y: 40 ; ... ...
A file (or stream) of events
Each event has one or more
data elements
Actual (physical) format not relevant
for us
5. Searching the log
Select AVG(closingPrice)
From ClosingStockPrices
Where stockSymbol = `MSFT'
for (t = ST; t < ST+50, t+= 5) {
WindowIs(ClosingStockPrices, t - 4, t);
}
6. Problems
Formal languages (e.g. logic, automata)
focus on event ordering; not so good at
performing computations over events
Complex Event Processing often reduces
to a thin layer over custom procedural
code
Goal: provide a formal and
non-procedural framework for
the processing of event streams
7. Traces
An event trace (or event stream) is a potentially
infinite sequence of events of a given type:
4 9 . . .
2 0 6 3
Traces are symbolically denoted by:
e = e0 e1 e2 e3 ...
The set of all traces of type T is denoted as:
T*
8. Processors
A processor is a function that takes 0 or more
event traces as input, and returns 0 or 1
event trace as output
1 : 1 processor
. . . . . .
2 : 1 processor
9. Composition
A high-level event trace can be produced by
composing ("piping") together one or more
processors from lower-level traces
10. Processor algebra
Goal: come up with a "toolbox" of basic
processors sufficient to perform various
computations over traces
?
11. A few useful functions
ιt(x) = {t if x = ε
x otherwise
Identity function: returns an event if given one,
or t if passed the empty event ε
+(x) = {x}
Wrap function
-({x}) = x
Peel function
/π
Path function: returns subtree at end
of path π
12. Semantics
Processors can be defined formally by
describing how their output trace is created
from their input trace(s)
Input trace(s)
e0, ..., en : φ(x0 , ..., xn)
Symbolic variables:
xi refers to the i-th trace
on the left
13. Constants as processors
Any element t of type T can be lifted as a
0 : 1 processor producing the infinite trace
t t t t ...
t t t . . .
The constant
processor t e : t = t t t ...
14. Input/output
0 : 1 processors can be used to produce an
event trace out of an external source (i.e.
standard input, a file, etc.)
a b . . .
Ditto for 1 : 0 processors
a b . . .
15. Mutator
Returns t, but only as many times as the
number of events received so far
e t t
i.e. "mutates" input events into t
16. Functions as processors
Any n-ary function f defined on individual
events can be lifted to an n:1 processor on
traces, by applying it successively to n-uples
. . . 2 0 6
+ 7 8 5
3 8 1
. . .
. . .
17. Functions as processors
Any n-ary function f defined on individual
events can be lifted to an n:1 processor on
traces, by applying it successively to n-uples
e0, e1 : x0+x1
=
e00+e10 e01+e11 , e02+e12 , , . . .
18. Freeze
Returns the first event received, upon every
event received
. . . b b a a a a . . .
e : x = e0 e0 e0 ...
19. Delay
Returns every the input trace, starting from its
n-th event
. . . c b a b . . .
2
e : n
x = en en+1 en+2 ...
= e n : x
c
20. Decimate
Returns every n-th event of the input trace
. . . c b a a . . .
2
e : n
x = e0 en e2n ...
Ψ c
Ψ
e : Ψ n x i = e : x ni
22. Window
Simulates the application of a "sliding
window" to a trace
Υn φ
Takes as arguments: another processor φ
and a window width n
Returns the result of φ after processing
events 0 to n-1...
Then the result of (a new instance of) φ
that processes events 1 to n...
...an so on
23. Window
Example: execution of the processor
on the trace
2 1 5 0
Υ2++
Υ2
2 1 5 0 3 6 5
2 1 2 13
12 15 12 16
25 01 25 15
24. Window
The window processor can take any
processor as an argument...
...i.e. the sliding window can be applied to
anything.
Formally:
e : Υ n φ i = e i
: φ n-1
25. Filter
Discards events from an input trace based
on a selection criterion
Φ φ
Takes as argument another processor φ
Evaluates φ on the trace that starts at event
0; returns that event if the first event
returned by φ is T
Same process on the trace that starts at
event 1...
...an so on
26. Filter
Example: execution of the processor
on the trace
2 1 5 0
Φ∈2IN
∈2IN
2 1 5 0 Φ 2 0
2 1 5 0 ∈2IN
27. Filter
The filter can take any processor as an
argument...
...including a processor that requires multiple
input events before outputting something
Formally:
e : Φ φ = Φ(e, φ) , e 1 : Φ φ
Φ(e, φ) = { e0 if
e : φ = T
0
no event otherwise
28. Spawn
Cumulative combination of a processor's
output for every suffix of a trace
Σf φ
Creates one new instance of processor
φ upon every new input event
Feeds each input event to all existing
instances of φ
Combines the value returned by each
instance using function f
...and outputs it
29. Spawne
Example: execution of the processor
on the trace
2 1 5 0
Σ+
x
x
2 1 5 0 Σ+ 2 3 8
x
8
2 1 5 0 2 1 5 0
+
1 5 0 x 1 5 0
+
5 0 x 5 0
30. Spawn
Formally:
e :
Σf φ
=
e : φφ 0 , f ( e : φφ 0 , e 1
:
Σf φ )
Turns out to be a powerful device; depending
on φ and f, can provide many useful
processors...
31. Spawn
Count events Σ+1
Cumulative sum Σ+
Set of all events Σ∪ +
= #
= ++
∪ =
37. All together now
Count pairs of successive events that are
more than one standard deviation from
the mean
E(X)
-
38. All together now
Count pairs of successive events that are
more than one standard deviation from
the mean
σ
E(X)
-
÷
39. All together now
Count pairs of successive events that are
more than one standard deviation from
the mean
σ
E(X)
-
> 1
÷ Φ
40. All together now
Count pairs of successive events that are
more than one standard deviation from
the mean
σ
E(X)
-
÷
X
> 1
Φ
∧ Φ
41. All together now
Count pairs of successive events that are
more than one standard deviation from
the mean
> 1
σ #
E(X)
-
÷
X
Φ
∧ Φ
42. Advantages
No imperative constructs
No restrictions on what can be piped to
what (modulo type compatibility)
Streaming operation: outputs produced
as inputs are being consumed
Implicit handling of buffering, duplication,
etc.
43. Demo!
Prototype implementation in Java
In this example, handles 100 events/sec.
Go see it on YouTube: http://goo.gl/QoS8Dy