React Native vs Ionic - The Best Mobile App Framework
Adaptive XML Tree Mining on Evolving Data Streams
1. Adaptive XML Tree Mining on Evolving Data Streams
Albert Bifet
Laboratory for Relational Algorithmics, Complexity and Learning LARCA
Departament de Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya
Porto, 21 May 2009
2. Mining Evolving Massive Structured Data
The basic problem
Finding interesting structure
on data
Mining massive data
Mining time varying data
Mining on real time
Mining XML data
The Disintegration of Persistence
of Memory 1952-54
Salvador Dalí
2 / 30
3. XML Tree Classification on evolving data
streams
D D D D
B B B B B B B
C C C C C C
A A
C LASS 1 C LASS 2 C LASS 1 C LASS 2
D
Figure: A dataset example
3 / 30
4. Tree Pattern Mining
Given a dataset of trees, find the
complete set of frequent subtrees
Frequent Tree Pattern (FT):
Include all the trees whose
support is no less than min_sup
Closed Frequent Tree Pattern
(CT):
Include no tree which has a
Trees are sanctuaries. super-tree with the same
Whoever knows how support
to listen to them,
can learn the truth. CT ⊆ FT
Herman Hesse
4 / 30
5. Mining Closed Frequent Trees
Our trees are: Our subtrees are:
Labeled and Unlabeled Induced
Ordered and Unordered Top-down
Two different ordered trees
but the same unordered tree
5 / 30
6. A tale of two trees
Consider D = {A, B}, where Frequent subtrees
A B
A:
B:
and let min_sup = 2.
6 / 30
7. A tale of two trees
Consider D = {A, B}, where Closed subtrees
A B
A:
B:
and let min_sup = 2.
6 / 30
8. XML Tree Classification on evolving data
streams
D D D D
B B B B B B B
C C C C C C
A A
C LASS 1 C LASS 2 C LASS 1 C LASS 2
D
Figure: A dataset example
7 / 30
9. XML Tree Classification on evolving data
streams
Tree Trans.
Closed Freq. not Closed Trees 1 2 3 4
D
B B
c1 C C C C 1 0 1 0
D
B B C A
C C A
c2 A A 1 0 0 1
8 / 30
10. XML Tree Classification on evolving data
streams
Frequent Trees
c1 c2 c3 c4
Id 1
c1 f1 1 2 3
c2 f2 f2 f2 1
c3 f3 1 2 3 4 5
c4 f4 f4 f4 f4 f4
1 1 1 1 1 1 1 0 0 1 1 1 1 1 1
2 0 0 0 0 0 0 1 1 1 1 1 1 1 1
3 1 1 0 0 0 0 1 1 1 1 1 1 1 1
4 0 0 1 1 1 1 1 1 1 1 1 1 1 1
Closed Maximal
Trees Trees
Id Tree c1 c2 c3 c4 c1 c2 c3 Class
1 1 1 0 1 1 1 0 C LASS 1
2 0 0 1 1 0 0 1 C LASS 2
3 1 0 1 1 1 0 1 C LASS 1
4 0 1 1 1 0 1 1 C LASS 2
9 / 30
11. XML Tree Framework on evolving data
streams
XML Tree Classification Framework Components
An XML closed frequent tree miner
A Data stream classifier algorithm, which we will feed with tuples
to be classified online.
10 / 30
12. Mining Evolving Tree Data Streams
Problem
Given a data stream D of rooted and unordered trees, find
frequent closed trees.
We provide three algorithms,
of increasing power
Incremental
Sliding Window
Adaptive
D
11 / 30
13. Mining Closed Unordered Subtrees
C LOSED _S UBTREES(t, D, min_sup, T )
1
2
3 for every t that can be extended from t in one step
4 do if Support(t ) ≥ min_sup
5 then T ← C LOSED _S UBTREES(t , D, min_sup, T )
6
7
8
9
10 return T
12 / 30
14. Mining Closed Unordered Subtrees
C LOSED _S UBTREES(t, D, min_sup, T )
1 if not C ANONICAL _R EPRESENTATIVE(t)
2 then return T
3 for every t that can be extended from t in one step
4 do if Support(t ) ≥ min_sup
5 then T ← C LOSED _S UBTREES(t , D, min_sup, T )
6
7
8
9
10 return T
12 / 30
15. Mining Closed Unordered Subtrees
C LOSED _S UBTREES(t, D, min_sup, T )
1 if not C ANONICAL _R EPRESENTATIVE(t)
2 then return T
3 for every t that can be extended from t in one step
4 do if Support(t ) ≥ min_sup
5 then T ← C LOSED _S UBTREES(t , D, min_sup, T )
6 do if Support(t ) = Support(t)
7 then t is not closed
8 if t is closed
9 then insert t into T
10 return T
12 / 30
18. Experimental results
TreeNat CMTreeMiner
Unlabeled Trees Labeled Trees
Top-Down Subtrees Induced Subtrees
No Occurrences Occurrences
14 / 30
19. Closure Operator on Trees
D: the finite input dataset of trees
T : the (infinite) set of all trees
Definition
We define the following the Galois connection pair:
For finite A ⊆ D
σ (A) is the set of subtrees of the A trees in T
σ (A) = {t ∈ T ∀ t ∈ A (t t )}
For finite B ⊂ T
τD (B) is the set of supertrees of the B trees in D
τD (B) = {t ∈ D ∀ t ∈ B (t t )}
Closure Operator
The composition ΓD = σ ◦ τD is a closure operator.
15 / 30
21. Galois Lattice of closed set of trees
1 2 3
D
B={ } 12 13 23
123
17 / 30
22. Galois Lattice of closed set of trees
B={ }
1 2 3
τD (B) = { , }
12 13 23
123
17 / 30
23. Galois Lattice of closed set of trees
B={ }
1 2 3
τD (B) = { , }
12 13 23
ΓD (B) = σ ◦τD(B) = { and its subtrees }
123
17 / 30
24. Algorithms
Algorithms
Incremental: I NC T REE N AT
Sliding Window: W IN T REE N AT
Adaptive: A DAT REE N AT Uses ADWIN to monitor change
ADWIN
An adaptive sliding window whose size is recomputed online
according to the rate of change observed.
ADWIN has rigorous guarantees (theorems)
On ratio of false positives and false negatives
On the relation of the size of the current window and change
rates
18 / 30
25. Experimental Validation: TN1
CMTreeMiner
300
Time 200
(sec.)
100
I NC T REE N AT
2 4 6 8
Size (Milions)
Figure: Experiments on ordered trees with TN1 dataset
19 / 30
26. What is MOA?
{M}assive {O}nline {A}nalysis is a framework for online learning
from data streams.
It is closely related to WEKA
It includes a collection of offline and online as well as tools for
evaluation:
boosting and bagging
Hoeffding Trees
with and without Naïve Bayes classifiers at the leaves.
20 / 30
28. MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
22 / 30
29. MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
22 / 30
30. MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
22 / 30
31. Data stream classification cycle
1 Process an example at a
time, and inspect it only
once (at most)
2 Use a limited amount of
memory
3 Work in a limited amount
of time
4 Be ready to predict at any
point
23 / 30
32. Environments and Data Sources
Environments
Sensor Network: 100Kb
Handheld Computer: 32 Mb
Server: 400 Mb
Data Sources
Random Tree Generator
Random RBF Generator
LED Generator
Waveform Generator
Function Generator
24 / 30
33. Algorithms
Naive Bayes Prediction strategies
Decision stumps Majority class
Hoeffding Tree Naive Bayes Leaves
Hoeffding Option Tree Adaptive Hybrid
Bagging and Boosting
25 / 30
34. Hoeffding Option Tree
Hoeffding Option Trees
Regular Hoeffding tree containing additional option nodes that
allow several tests to be applied, leading to multiple Hoeffding
trees as separate paths.
26 / 30
37. Ensemble Methods
http://www.cs.waikato.ac.nz/∼abifet/MOA/
New ensemble methods:
ADWIN bagging: When a change is detected, the worst classifier
is removed and a new classifier is added.
Adaptive-Size Hoeffding Tree bagging
28 / 30
38. XML Tree Framework on evolving data
streams
Maximal Closed
# Trees Att. Acc. Mem. Att. Acc. Mem.
CSLOG12 15483 84 79.64 1.2 228 78.12 2.54
CSLOG23 15037 88 79.81 1.21 243 78.77 2.75
CSLOG31 15702 86 79.94 1.25 243 77.60 2.73
CSLOG123 23111 84 80.02 1.7 228 78.91 4.18
Table: BAGGING on unordered trees.
29 / 30
39. Conclusions
XML tree stream classifier system.
Using Galois Latice Theory, we present methods for mining
closed trees
Incremental
Sliding Window
Adaptive: using ADWIN to monitor change
We use MOA data stream classifiers.
30 / 30