Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this talk we present a framework for studying graph pattern mining on time-varying streams and large datasets.
Mining Frequent Closed Graphs on Evolving Data Streams
1. Mining Frequent Closed Graphs on Evolving Data Streams
`
A. Bifet, G. Holmes, B. Pfahringer and R. Gavalda
University of Waikato
Hamilton, New Zealand
Laboratory for Relational Algorithmics, Complexity and Learning LARCA
UPC-Barcelona Tech, Catalonia
San Diego, 24 August 2011
17th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining 2011
2. Mining Evolving Graph Data Streams
Problem
Given a data stream D of graphs, find frequent closed graphs.
Transaction Id Graph We provide three algorithms,
O of increasing power
C C S N Incremental
1 O Sliding Window
O Adaptive
C C S N
2 C
N
3 C C S N
2 / 48
3. Non-incremental
Frequent Closed Graph Mining
CloseGraph: Xifeng Yan, Jiawei Han
right-most extension based on depth-first search
based on gSpan ICDM’02
MoSS: Christian Borgelt, Michael R. Berthold
breadth-first search
based on MoFa ICDM’02
3 / 48
4. Outline
1 Streaming Data
2 Frequent Pattern Mining
Mining Evolving Graph Streams
A DAG RAPH M INER
3 Experimental Evaluation
4 Summary and Future Work
4 / 48
5. Outline
1 Streaming Data
2 Frequent Pattern Mining
Mining Evolving Graph Streams
A DAG RAPH M INER
3 Experimental Evaluation
4 Summary and Future Work
5 / 48
7. Data Streams
Data Streams
Sequence is potentially infinite
High amount of data
High speed of arrival
Once an element from a data stream has been processed
it is discarded or archived
Tools:
approximation
randomization, sampling
sketching
7 / 48
8. Data Streams Approximation Algorithms
1011000111 1010101
Sliding Window
We can maintain simple statistics over sliding windows, using
O ( 1 log2 N ) space, where
ε
N is the length of the sliding window
ε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.
Maintaining stream statistics over sliding windows. 2002
8 / 48
9. Data Streams Approximation Algorithms
10110001111 0101011
Sliding Window
We can maintain simple statistics over sliding windows, using
O ( 1 log2 N ) space, where
ε
N is the length of the sliding window
ε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.
Maintaining stream statistics over sliding windows. 2002
8 / 48
10. Data Streams Approximation Algorithms
101100011110 1010111
Sliding Window
We can maintain simple statistics over sliding windows, using
O ( 1 log2 N ) space, where
ε
N is the length of the sliding window
ε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.
Maintaining stream statistics over sliding windows. 2002
8 / 48
11. Data Streams Approximation Algorithms
1011000111101 0101110
Sliding Window
We can maintain simple statistics over sliding windows, using
O ( 1 log2 N ) space, where
ε
N is the length of the sliding window
ε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.
Maintaining stream statistics over sliding windows. 2002
8 / 48
12. Data Streams Approximation Algorithms
10110001111010 1011101
Sliding Window
We can maintain simple statistics over sliding windows, using
O ( 1 log2 N ) space, where
ε
N is the length of the sliding window
ε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.
Maintaining stream statistics over sliding windows. 2002
8 / 48
13. Data Streams Approximation Algorithms
101100011110101 0111010
Sliding Window
We can maintain simple statistics over sliding windows, using
O ( 1 log2 N ) space, where
ε
N is the length of the sliding window
ε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.
Maintaining stream statistics over sliding windows. 2002
8 / 48
14. Outline
1 Streaming Data
2 Frequent Pattern Mining
Mining Evolving Graph Streams
A DAG RAPH M INER
3 Experimental Evaluation
4 Summary and Future Work
9 / 48
18. Itemset Mining
d1 abce Support Frequent Gen Closed
d2 cde 6 c c c
d3 abce 5 e,ce e ce
d4 acde 4 a,ac,ae,ace a ace
d5 abcde 4 b,bc b bc
d6 bcd 4 d,cd d cd
3 ab,abc,abe ab
be,bce,abce be abce
3 de,cde de cde
12 / 48
19. Itemset Mining
d1 abce Support Frequent Gen Closed Max
d2 cde 6 c c c
d3 abce 5 e,ce e ce
d4 acde 4 a,ac,ae,ace a ace
d5 abcde 4 b,bc b bc
d6 bcd 4 d,cd d cd
3 ab,abc,abe ab
be,bce,abce be abce abce
3 de,cde de cde cde
12 / 48
20. Outline
1 Streaming Data
2 Frequent Pattern Mining
Mining Evolving Graph Streams
A DAG RAPH M INER
3 Experimental Evaluation
4 Summary and Future Work
13 / 48
22. Graph Coresets
Coreset of a set P with respect to some problem
Small subset that approximates the original set P.
Solving the problem for the coreset provides an
approximate solution for the problem on P.
δ -tolerance Closed Graph
A graph g is δ -tolerance closed if none of its proper frequent
supergraphs has a weighted support ≥ (1 − δ ) · support (g ).
Maximal graph: 1-tolerance closed graph
Closed graph: 0-tolerance closed graph.
15 / 48
23. Graph Coresets
Coreset of a set P with respect to some problem
Small subset that approximates the original set P.
Solving the problem for the coreset provides an
approximate solution for the problem on P.
δ -tolerance Closed Graph
A graph g is δ -tolerance closed if none of its proper frequent
supergraphs has a weighted support ≥ (1 − δ ) · support (g ).
Maximal graph: 1-tolerance closed graph
Closed graph: 0-tolerance closed graph.
15 / 48
24. Graph Coresets
Relative support of a closed graph
Support of a graph minus the relative support of its closed
supergraphs.
The sum of the closed supergraphs’ relative supports of a
graph and its relative support is equal to its own support.
(s, δ )-coreset for the problem of computing closed
graphs
Weighted multiset of frequent δ -tolerance closed graphs with
minimum support s using their relative support as a weight.
16 / 48
25. Graph Coresets
Relative support of a closed graph
Support of a graph minus the relative support of its closed
supergraphs.
The sum of the closed supergraphs’ relative supports of a
graph and its relative support is equal to its own support.
(s, δ )-coreset for the problem of computing closed
graphs
Weighted multiset of frequent δ -tolerance closed graphs with
minimum support s using their relative support as a weight.
16 / 48
27. Graph Coresets
Graph Relative Support Support
C C S N 3 3
O
C S N 3 3
N
C S 3 3
Table: Example of a coreset with minimum support 50% and δ = 1
18 / 48
29. Outline
1 Streaming Data
2 Frequent Pattern Mining
Mining Evolving Graph Streams
A DAG RAPH M INER
3 Experimental Evaluation
4 Summary and Future Work
20 / 48
30. I NC G RAPH M INER
I NC G RAPH M INER (D, min sup )
Input: A graph dataset D, and min sup.
Output: The frequent graph set G.
1 G←0 /
2 for every batch bt of graphs in D
3 do C ← C ORESET (bt , min sup )
4 G ← C ORESET (G ∪ C, min sup )
5 return G
21 / 48
31. W IN G RAPH M INER
W IN G RAPH M INER (D, W , min sup )
Input: A graph dataset D, a size window W and min sup.
Output: The frequent graph set G.
1 G←0 /
2 for every batch bt of graphs in D
3 do C ← C ORESET (bt , min sup )
4 Store C in sliding window
5 if sliding window is full
6 then R ← Oldest C stored in sliding window,
negate all support values
7 else R ← 0 /
8 G ← C ORESET (G ∪ C ∪ R, min sup )
9 return G
22 / 48
32. Algorithm ADaptive Sliding WINdow
Example
W = 101010110111111
W0 = 1
ADWIN: A DAPTIVE W INDOWING A LGORITHM
1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µW0 − µW1 | ≥ εc holds
ˆ ˆ
6 for every split of W into W = W0 · W1
7 Output µWˆ
23 / 48
33. Algorithm ADaptive Sliding WINdow
Example
W = 101010110111111
W0 = 1 W1 = 01010110111111
ADWIN: A DAPTIVE W INDOWING A LGORITHM
1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µW0 − µW1 | ≥ εc holds
ˆ ˆ
6 for every split of W into W = W0 · W1
7 Output µWˆ
23 / 48
34. Algorithm ADaptive Sliding WINdow
Example
W = 101010110111111
W0 = 10 W1 = 1010110111111
ADWIN: A DAPTIVE W INDOWING A LGORITHM
1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µW0 − µW1 | ≥ εc holds
ˆ ˆ
6 for every split of W into W = W0 · W1
7 Output µWˆ
23 / 48
35. Algorithm ADaptive Sliding WINdow
Example
W = 101010110111111
W0 = 101 W1 = 010110111111
ADWIN: A DAPTIVE W INDOWING A LGORITHM
1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µW0 − µW1 | ≥ εc holds
ˆ ˆ
6 for every split of W into W = W0 · W1
7 Output µWˆ
23 / 48
36. Algorithm ADaptive Sliding WINdow
Example
W = 101010110111111
W0 = 1010 W1 = 10110111111
ADWIN: A DAPTIVE W INDOWING A LGORITHM
1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µW0 − µW1 | ≥ εc holds
ˆ ˆ
6 for every split of W into W = W0 · W1
7 Output µWˆ
23 / 48
37. Algorithm ADaptive Sliding WINdow
Example
W = 101010110111111
W0 = 10101 W1 = 0110111111
ADWIN: A DAPTIVE W INDOWING A LGORITHM
1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µW0 − µW1 | ≥ εc holds
ˆ ˆ
6 for every split of W into W = W0 · W1
7 Output µWˆ
23 / 48
38. Algorithm ADaptive Sliding WINdow
Example
W = 101010110111111
W0 = 101010 W1 = 110111111
ADWIN: A DAPTIVE W INDOWING A LGORITHM
1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µW0 − µW1 | ≥ εc holds
ˆ ˆ
6 for every split of W into W = W0 · W1
7 Output µWˆ
23 / 48
39. Algorithm ADaptive Sliding WINdow
Example
W = 101010110111111
W0 = 1010101 W1 = 10111111
ADWIN: A DAPTIVE W INDOWING A LGORITHM
1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µW0 − µW1 | ≥ εc holds
ˆ ˆ
6 for every split of W into W = W0 · W1
7 Output µWˆ
23 / 48
40. Algorithm ADaptive Sliding WINdow
Example
W = 101010110111111
W0 = 10101011 W1 = 0111111
ADWIN: A DAPTIVE W INDOWING A LGORITHM
1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µW0 − µW1 | ≥ εc holds
ˆ ˆ
6 for every split of W into W = W0 · W1
7 Output µWˆ
23 / 48
41. Algorithm ADaptive Sliding WINdow
Example
W = 101010110111111 |µW0 − µW1 | ≥ εc : CHANGE DET.!
ˆ ˆ
W0 = 101010110 W1 = 111111
ADWIN: A DAPTIVE W INDOWING A LGORITHM
1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µW0 − µW1 | ≥ εc holds
ˆ ˆ
6 for every split of W into W = W0 · W1
7 Output µWˆ
23 / 48
42. Algorithm ADaptive Sliding WINdow
Example
W = 101010110111111 Drop elements from the tail of W
W0 = 101010110 W1 = 111111
ADWIN: A DAPTIVE W INDOWING A LGORITHM
1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µW0 − µW1 | ≥ εc holds
ˆ ˆ
6 for every split of W into W = W0 · W1
7 Output µWˆ
23 / 48
43. Algorithm ADaptive Sliding WINdow
Example
W = 01010110111111 Drop elements from the tail of W
W0 = 101010110 W1 = 111111
ADWIN: A DAPTIVE W INDOWING A LGORITHM
1 Initialize Window W
2 for each t > 0
3 do W ← W ∪ {xt } (i.e., add xt to the head of W )
4 repeat Drop elements from the tail of W
5 until |µW0 − µW1 | ≥ εc holds
ˆ ˆ
6 for every split of W into W = W0 · W1
7 Output µWˆ
23 / 48
44. Algorithm ADaptive Sliding WINdow
Theorem
At every time step we have:
1 (False positive rate bound). If µt remains constant within
W , the probability that ADWIN shrinks the window at this
step is at most δ .
2 (False negative rate bound). Suppose that for some
partition of W in two parts W0 W1 (where W1 contains the
most recent items) we have |µW0 − µW1 | > 2εc . Then with
probability 1 − δ ADWIN shrinks W to W1 , or shorter.
ADWIN tunes itself to the data stream at hand, with no need for
the user to hardwire or precompute parameters.
24 / 48
45. Algorithm ADaptive Sliding WINdow
ADWIN using a Data Stream Sliding Window Model,
can provide the exact counts of 1’s in O (1) time per point.
tries O (log W ) cutpoints
uses O ( 1 log W ) memory words
ε
the processing time per example is O (log W ) (amortized
and worst-case).
Sliding Window Model
1010101 101 11 1 1
Content: 4 2 2 1 1
Capacity: 7 3 2 1 1
25 / 48
46. A DAG RAPH M INER
A DAG RAPH M INER (D, Mode, min sup )
1 G←0 /
2 Init ADWIN
3 for every batch bt of graphs in D
4 do C ← C ORESET (bt , min sup )
5 R←0 /
6 if Mode is Sliding Window
7 then Store C in sliding window
8 if ADWIN detected change
9 then R ← Batches to remove
in sliding window
with negative support
10 G ← C ORESET (G ∪ C ∪ R, min sup )
11 if Mode is Sliding Window
12 then Insert # closed graphs into ADWIN
13 else for every g in G update g’s ADWIN
14 return G
26 / 48
47. A DAG RAPH M INER
A DAG RAPH M INER (D, Mode, min sup )
1 G←0 /
2 Init ADWIN
3 for every batch bt of graphs in D
4 do C ← C ORESET (bt , min sup )
5 R←0 /
6
7
8
9
10 G ← C ORESET (G ∪ C ∪ R, min sup )
11
12
13 for every g in G update g’s ADWIN
14 return G
26 / 48
48. A DAG RAPH M INER
A DAG RAPH M INER (D, Mode, min sup )
1 G←0 /
2 Init ADWIN
3 for every batch bt of graphs in D
4 do C ← C ORESET (bt , min sup )
5 R←0 /
6 if Mode is Sliding Window
7 then Store C in sliding window
8 if ADWIN detected change
9 then R ← Batches to remove
in sliding window
with negative support
10 G ← C ORESET (G ∪ C ∪ R, min sup )
11 if Mode is Sliding Window
12 then Insert # closed graphs into ADWIN
13
14 return G
26 / 48
49. Outline
1 Streaming Data
2 Frequent Pattern Mining
Mining Evolving Graph Streams
A DAG RAPH M INER
3 Experimental Evaluation
4 Summary and Future Work
27 / 48
50. Experimental Evaluation
ChemDB dataset
Public dataset
4 million molecules
Institute for Genomics and Bioinformatics at the University
of California, Irvine
Open NCI Database
Public domain
250,000 structures
National Cancer Institute
28 / 48
51. What is MOA?
{M}assive {O}nline {A}nalysis is a framework for online
learning from data streams.
It is closely related to WEKA
It includes a collection of offline and online as well as tools
for evaluation:
classification
clustering
Easy to extend
Easy to design and run experiments
29 / 48
60. Clustering Experimental Setting
Internal measures External measures
Gamma Rand statistic
C Index Jaccard coefficient
Point-Biserial Folkes and Mallow Index
Log Likelihood Hubert Γ statistics
Dunn’s Index Minkowski score
Tau Purity
Tau A van Dongen criterion
Tau C V-measure
Somer’s Gamma Completeness
Ratio of Repetition Homogeneity
Modified Ratio of Repetition Variation of information
Adjusted Ratio of Clustering Mutual information
Fagan’s Index Class-based entropy
Deviation Index Cluster-based entropy
Z-Score Index Precision
D Index Recall
Silhouette coefficient F-measure
Table: Internal and external clustering evaluation measures.
36 / 48
61. Cluster Mapping Measure
Hardy Kremer, Philipp Kranen, Timm Jansen, Thomas
Seidl, Albert Bifet, Geoff Holmes and Bernhard Pfahringer.
An Effective Evaluation Measure for Clustering on Evolving
Data Stream
KDD’11
CMM: Cluster Mapping Measure
A novel evaluation measure for stream clustering on evolving
streams
∑o∈F w (o ) · pen(o, C )
CMM (C , C L ) = 1 −
∑o∈F w (o ) · con(o, Cl (o ))
37 / 48
62. Extensions of MOA
Multi-label Classification
Active Learning
Regression
Closed Frequent Graph Mining
Twitter Sentiment Analysis
Challenges for bigger data streams
Sampling and distributed systems (Map-Reduce, Hadoop, S4)
38 / 48
68. Outline
1 Streaming Data
2 Frequent Pattern Mining
Mining Evolving Graph Streams
A DAG RAPH M INER
3 Experimental Evaluation
4 Summary and Future Work
44 / 48
69. Summary
Mining Evolving Graph Data Streams
Given a data stream D of graphs, find frequent closed graphs.
Transaction Id Graph We provide three algorithms,
O of increasing power
C C S N Incremental
1 O Sliding Window
O Adaptive
C C S N
2 C
N
3 C C S N
45 / 48
70. Summary
{M}assive {O}nline {A}nalysis is a framework for online
learning from data streams.
http://moa.cs.waikato.ac.nz
It is closely related to WEKA
It includes a collection of offline and online as well as tools
for evaluation:
classification
clustering
frequent pattern mining
MOA deals with evolving data streams
MOA is easy to use and extend
46 / 48