Gen AI in Business - Global Trends Report 2024.pdf
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
1. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Handling Concept Drift:
Importance, Challenges & Solutions
A. Bifet, J. Gama, M. Pechenizkiy, I. Žliobaitė
PAKDD-2011 Tutorial
May 27, Shenzhen, China
http://www.cs.waikato.ac.nz/~abifet/PAKDD2011/
2. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Motivation for the Tutorial
• in the real world data
– more and more data is organized as data streams
– data comes from complex environment, and it is
evolving over time
– concept drift = underlying distribution of data is
changing
• attention to handling concept drift is increasing,
since predictions become less accurate over time
• to prevent that, learning models need to adapt
to changes quickly and accurately
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
2
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
3. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Objectives of the Tutorial
• many approaches to handle drifts in a wide
range of applications have been proposed
• the tutorial aims to provide
an integrated view of the adaptive learning
methods and application needs
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
3
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
4. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Outline
Block 1: PROBLEM
Block 2: TECHNIQUES
Block 3: EVALUATION
Block 4: APPLICATIONS
OUTLOOK
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
4
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
5. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Presenters & Schedule
Indrė Žliobaitė Block 1: what concept
drift is, why it needs
14:05 – 14:50
to be handled and how
João Gama
14:50 – 15:40 Block 2: approaches for
adaptive machine learning
to handle concept drift
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
5
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
6. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Presenters & Schedule
Albert Bifet
Block 3: training/evaluation
16:00 – 16:45 of adaptive learning approaches
and a software demo
Mykola Pechenizkiy
Block 4: challenges for
handling concept drift
16:45 – 17:30
in different applications
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
6
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
7. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
7
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
8. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Block 1: introduction
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
8
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
9. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
The goals:
• to introduce the problem of concept drift in
supervised learning
• to overview and categorize the main
principles how concept drift can be (and is)
handled
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
9
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
10. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Example: production
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
10
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
11. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
many process
What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook many process
parameters parameters
Example: production
prediction
predict quality
reduced and
transformed
parameters
t1, t2 q
3
79
monitor 78
3S
Konz. (INA&INAL)
2
2S
77
1 76
75 Target
t[2]
0
74
-1 73
2S
72
-2
3S
0 100 200 300 400
-3
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Num
t[1]
SIMCA-P+ 11 - 13.12.2005 10:17:16
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
11
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
12. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
many process
What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook many process
parameters parameters
Example: production
prediction
predict quality
reduced and
transformed
parameters
t1, t2 q
3
79
monitor 78
3S
Konz. (INA&INAL)
2
2S
77
1 76
75 Target
t[2]
0
74
-1 73
2S
72
-2
3S
0 100 200 300 400
-3
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Num
t[1]
SIMCA-P+ 11 - 13.12.2005 10:17:16
after 2-3 months
predictions get
“out of tune”
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
12
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
13. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Static model
Model does not
change
Process changes
source: Evonik Industries
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
13
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
14. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
the supplier of raw a sensor breaks/ new operating
material changes “wears off”/ crew
is replaced
Model does not
change
Process changes
source: Evonik Industries
new regulations to new production
save electricity procedures
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
14
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
15. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Desired Properties of a System To
Handle Concept Drift
• Adapt to concept drift asap
• Distinguish noise from changes
– robust to noise, but adaptive to changes
• Recognizing and reacting
to reoccurring contexts
• Adapting with limited resources
– time and memory
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
15
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
16. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Setting: adaptive learning
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
16
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
17. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Supervised Learning
Training:
Learning
Testing X' a mapping function
population data
y = L (x)
2.
Application:
Applying L
to unseen data
X Historical data Learner y' = L (X')
1. training
y labels 2. application
testing labels ? y'
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
17
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
18. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Learning with concept drift
population
Training:
= ?? Testing
data
X' y = L (x)
population
Application:
y' = L?? (X')
X Historical data Learner
y labels
testing labels ? y'
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
18
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
19. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Online setting
?
time
Data arrives in a stream
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
19
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
20. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Adaptive Learning
?
time
updated
model
as time passes
the model updates/retrains prediction
itself if needed
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
20
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
21. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
How to Adapt
Select the right
training data and or adjust
retrain the model
incrementally
?
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
21
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
22. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
The main strategies how
to handle concept drift
Images.com
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
22
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
23. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Adaptive learning strategies
Triggering Evolving
Single classifier
Detectors Forgetting
variable windows fixed windows,
Instance weighting
Ensemble Dynamic
Contextual ensemble
adaptive
dynamic integration,
combination rules
meta learning
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
23
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
24. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Adaptive learning strategies
Triggering Evolving
change detection and adapting at every step
a follow up reaction
Single classifier
Detectors Forgetting
variable windows fixed windows,
Instance weighting
Ensemble Dynamic
Contextual ensemble
adaptive
dynamic integration,
combination rules
meta learning
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
24
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
25. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Adaptive learning strategies
reactive,
Triggering Evolving
forgetting
Single classifier
Detectors Forgetting
variable windows fixed windows,
Instance weighting
Ensemble Dynamic
Contextual ensemble
maintain adaptive
dynamic integration,
some combination rules
meta learning
memory
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
25
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
26. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Adaptive learning strategies
Triggering Evolving
forget old data and retrain at
a fixed rate
Single classifier
Detectors Forgetting
fixed windows,
Instance weighting
Ensemble Dynamic
Contextual ensemble
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
26
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
27. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Fixed Training Window
time
train predict
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
27
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
28. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Fixed Training Window
time
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
28
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
29. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Adaptive learning strategies
Triggering Evolving
detect a change and cut
Single classifier
Detectors Forgetting
variable windows
Ensemble Dynamic
Contextual ensemble
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
29
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
30. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Variable Training Window
detect a change
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
30
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
31. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Variable Training Window
drop old data
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
31
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
32. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Variable Training Window
retrain the model predict
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
32
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
33. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Adaptive learning strategies
Triggering Evolving
Single classifier
Detectors Forgetting
Ensemble Dynamic
Contextual ensemble
adaptive
build many models, combination rules
dynamically combine
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
33
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
34. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Dynamic Ensemble
Classifier 1
Classifier 2
vote
Classifier 3
Classifier 4
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
34
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
35. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Dynamic Ensemble
voter 1 time
voter 2
voter 3
voter 4
TRUE
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
35
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
36. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Dynamic Ensemble
punish
voter 1 voter 1
reward
voter 2 voter 2
punish
voter 3 voter 3
reward
voter 4 voter 4
TRUE
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
36
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
37. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Dynamic Ensemble
punish
voter 1 voter 1
reward
voter 2 voter 2
punish
voter 3 voter 3
reward
voter 4 voter 4
TRUE TRUE
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
37
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
38. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Dynamic Ensemble
punish
voter 1 voter 1
voter 1
reward
voter 2 voter 2
voter 2
punish
voter 3 voter 3
voter 3
reward
voter 4 voter 4
voter 4
TRUE TRUE
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
38
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
39. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Adaptive learning strategies
Triggering Evolving
Single classifier
Detectors Forgetting
Ensemble Dynamic
Contextual ensemble
build many models, dynamic integration,
switch models according meta learning
to the observed incoming data
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
39
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
40. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Contextual (Meta) Approaches
partition the training data
Group 1 = Classifier 1 build classifiers
Group 3 = Classifier 3
Group 2 = Classifier 2
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
40
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
41. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Contextual (Meta) Approaches
find to which partition
the new instance belongs and
assign the classifier
Group 1 = Classifier 1
Group 3 = Classifier 3
Group 2 = Classifier 2
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
41
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
42. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
• adaptive learning approaches implicitly or
explicitly assume some type of change
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
42
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
43. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Types of Changes: speed
mean
SUDDEN
time
mean
GRADUAL
time
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
43
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
44. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Types of Changes: reoccurrence
mean
ALWAYS NEW*
time
mean
time
mean
REOCCURING
time
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
44
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
45. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Typically Used*
Triggering Evolving
(change detection) (adapting every step)
sudden drift
Single classifier
Detectors Forgetting
variable windows fixed windows,
Instance weighting
Ensemble Dynamic
Contextual ensemble
adaptive
* this mapping is dynamic integration,
fusion rules
not strict or crisp meta learning
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
45
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
46. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Typically Used
Triggering Evolving
(change detection) (adapting every step)
Single classifier
Detectors Forgetting
variable windows fixed windows,
Instance weighting
Ensemble Dynamic
Contextual ensemble
adaptive
dynamic integration,
fusion rules
meta learning
PAKDD-2011 Tutorial,
May 27, Shenzhen, China gradual drift 46
47. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Typically Used
Triggering Evolving
(change detection) (adapting every step)
Single classifier
Detectors Forgetting
reoccuring drift variable windows fixed windows,
Instance weighting
Ensemble Dynamic
Contextual ensemble
adaptive
dynamic integration,
fusion rules
meta learning
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
47
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
48. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Which approach to use?
• changes occur over time
• we need models that evolve over time
• choice of technique depends on
– what type of change is expected
– user goals/ applications
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
48
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
49. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Block summary
Images.com
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
49
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
50. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Block summary
• Concept drift is a problem in data stream
applications, since static models lose accuracy
• Different adaptive techniques exist
• The choice of technique depends on
– what change is expected
– peculiarities of the task and application
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
50
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
51. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Block 2: methods and techniques
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
51
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
52. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
The goal:
to discuss selected popular approaches from
each of the major types in more detail
• Data management Forgetting
• Detection Methods Detectors
Dynamic
• Adaptation methods ensembles
• Decision model management Contextual
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
52
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
53. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Input Data
Short term Memory
Data Management
Control
Learning
Detection
Adaptation
Memory
Model Management
Model Adaptation
PAKDD-2011 Tutorial, Output Model 53
May 27, Shenzhen, China
54. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Data Management
• Characterize the information stored in
memory to maintain a decision model
consistent with the actual state of the nature.
– Full Memory.
– Partial Memory.
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
54
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
55. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Weighting examples
• Full Memory.
– Store in memory sufficient statistics over all the examples.
– Weighting the examples accordingly to their age.
– Oldest examples are less important.
• Weighting examples based on the age:
wλ ( x) = exp(−λt x )
where example x was found tx time steps ago
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
55
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
56. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Weight as a Function of Age
PAKDD-2011 Tutorial,
56
May 27, Shenzhen, China
57. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Partial Memory
• Partial Memory.
– Store in memory only the most recent examples.
– Examples are stored in a first-in first-out data structure.
– At each time step the learner induces a decision model
using only the examples that are included in the window.
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
57
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
58. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Partial Memory
• The key difficulty is how to select the appropriate
window size:
– Small window
• can assure a fast adaptability in phases with concept
changes
• In more stable phases it can affect the learner
performance
– Large window
• produce good and stable learning results in stable
phases
• can not react quickly to concept changes.
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
58
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
59. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Partial Memory - Windows
• Fixed Size windows.
– Store in memory a fixed number of the most recent
examples.
– Whenever a new example is available:
• it is stored in memory and
• the oldest one is discarded.
– This is the simplest method to deal with concept drift and
can be used as a baseline for comparisons.
• Adaptive Size windows.
– the set of examples in the window is variable.
– They are used in conjunction with a detection model.
– Decreasing the size of the window whenever the detection
model signals drift and increasing otherwise.
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
59
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
60. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Discussion
• The data management model also indicates the
forgetting mechanism.
– Weighting examples:
• Corresponds to a gradual forgetting.
• The relevance of old information is less and less important.
– Time windows
• Corresponds to abrupt forgetting;
• The examples are deleted from memory.
– Of course we can combine both forgetting
mechanisms by weighting the examples in a time
window.
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
60
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
61. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Discussion
• These methods are blind adaptation models:
– There is no explicit change detection
– Do not provide:
• Any indication about change points
• Dynamics of the process generating data
PAKDD-2011 Tutorial, A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite
61
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
62. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Detection Methods
• The Detection Model characterizes the techniques
and mechanisms for explicit drift detection.
– An advantage of the detection model is they can
provide:
• Meaningful descriptions:
– indicating change-points or
– small time-windows where the change occurs
• Quantification of the changes.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
62
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
63. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Detection Methods
• Monitoring the evolution of performance
indicators.
(See Klinkenberg (98) for a good overview of these
indicators)
– Performance measures,
– Properties of the data, etc
• Monitoring distributions on two different time-
windows
(See D.Kifer, S.Ben-David, J.Gehrke; VLDB 04)
– A reference window over past examples
– A window over the most recent examples
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
63
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
64. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Monitoring the evolution of performance
indicators
• Klinkenberg (1998)proposed monitoring the
values of three performance indicators:
– Accuracy, recall and precision over time,
– and then, comparing it to a confidence interval of
standard sample errors
– for a moving average value using the last M batches
of each particular indicator.
• FLORA family of algorithms (Widmer and Kubat, 1996)
monitors accuracy and model size.
– includes a window adjustment heuristic for a rule-
based classifier.
– FLORA3: dealing with recurring concepts.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
64
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
65. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Monitoring distributions on two
different time-windows
• D.Kifer, S.Ben-David, J.Gehrke (VLDB 04)
– Propose algorithms (statistical tests based on Chernoff
bound) that examine samples drawn from two probability
distributions and decide whether these distributions are
different.
• Gama, Fernandes, Rocha: VFDTc (IDA 2006)
– For streaming data decision tree induction
• Exploits tree structure: nested trees
– Continuously monitoring differences between two class-
distribution of the examples:
• the distribution when a node was built and the class-distribution
when a node was a leaf and
• the weighted sum of the class-distributions in the leaves
descendant of that node.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
65
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
66. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Hoeffding Trees
• Multiple windows
– Each node recives information from different windows
• Root node: oldest data
• Leaves: most recent data
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
66
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
67. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
The RS Method
• Implemented in the VFDTc system (IDA 2006)
• For each decision node i, compute two estimates
of the classification error.
– Static error (SEi):
• the distribution of the error at node i ;
– Backed up error (BUEi):
• the sum of the error distributions of all the descending
leaves of the node i ;
– With these two distributions:
• we can detect the concept change,
• by verifying the condition SEi ≤ BUEi
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
67
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
68. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
RS Method
• Each new example traverses
the tree from the root to a leaf
– At the leaf: update the value of
SEi
– It makes the opposite path, and
update the values of SEi and BUEi
for each internal node,
– Verify the regularization
condition
SEi ≤ BUEi :
• If SEi ≤ BUEi then the node i is
pruned to a new leaf.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
68
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
69. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
69
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
70. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
ADAPTATION METHODS
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
70
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
71. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Adaptation Methods
• The Adaptation model characterizes the
changes in the decision model do adapt to the
most recent examples.
– Blind Methods:
• Methods that adapt the learner at regular intervals
without considering whether changes have really
occurred.
– Informed Methods:
• Methods that only change the decision model after a
change was detected. They are used in conjunction
with a detection model.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
71
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
72. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Granularity of Decision Models
• Occurrences of drift can have impact in part of the
instance space.
– Global models: Require the reconstruction of all
the decision model.
• like naive Bayes, SVM, etc
– Granular decision models: Require the
reconstruction of parts of the decision model
• like decision rules, decision trees
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
72
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
73. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
CVFDT algorithm
• Mining time-changing data streams , G. Hulten, L.
Spencer, and P. Domingos; ACM SIGKDD 2001
• Process examples from the stream indefinitely.
– For each example (x, y),
• Pass (x, y) down to a set of leaves using HT and all alternate trees of
the nodes (x, y) passes through.
• Add (x, y) to the sliding window of examples.
• Periodically scans the internal nodes of HT.
• Start a new alternate tree when a new winning attribute
is found.
– Tighter criteria to avoid excessive alternate tree creation.
– Limit the total number of alternate trees
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
73
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
74. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Smoothly adjust to concept drift
• Alternate trees are grown the
Age<30?
same way HT is.
Yes No
• Periodically each node with
non-empty alternate trees
enter a testing mode. Married? Car Type= Yes
Sports Car?
– M training examples to Yes No
compare accuracy. Yes No
– Prune alternate trees with Yes No Experience
No
non-increasing accuracy over <1 year?
time. Yes No
– Replace if an alternate tree is
more accurate. No Yes
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
74
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
75. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Decision model management
• Model management characterizes the number
of decision models needed to maintain in
memory.
• The key issue here is the assumption that data
generated comes from multiple distributions,
– at least in the transition between contexts.
– Instead of maintaining a single decision model
several authors propose the use of multiple
decision models.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
75
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
76. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Dynamic Weighted Majority
• A seminal work, is the system presented by
Kolter and Maloof (ICDM03, ICML05).
• The Dynamic Weighted Majority algorithm
(DWM) is an ensemble method for tracking
concept drift.
– Maintains an ensemble of base learners,
– Predicts using a weighted-majority vote of these
experts.
– Dynamically creates and deletes experts in response
to changes in performance.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
76
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
77. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
DETECTION ALGORITHMS
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
77
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
78. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Change Detection in Predictive Learning
• When there is a change in the class-distribution of the
examples:
– The actual model does not correspond any more to the
actual distribution.
– The error-rate increases
• Basic Idea:
– Learning is a process.
– Monitor the quality of the learning process:
• Using Statistical Process Control techniques.
– Monitor the evolution of the error rate.
– Main Problems:
• How to distinguish Change from Noise?
• How to React to drift?
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
78
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
79. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
SPC-Statistical Process Control
Learning with Drift Detection ,Gama, Medas, Gladys, Rodrigues; SBIA-
LNCS Springer, 2004.
• Suppose a sequence of examples in the form < xi,yi>
• The actual decision model classifies each example in the sequence
– In the 0-1 loss function, predictions are either True or False
– The predictions of the learning algorithm are sequences:
T, F,T, F,T, F,T,T,T, F, …
– The Error is a random variable from Bernoulli trials.
– The Binomial distribution gives the general form of the
probability of observing a F:
• pi = (#F/i)
• where i is the number of trials.
si = pi (1 − pi ) / i
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
79
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
80. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Statistical Processing Control Algorithm
• Maintains two registers:
– Pmin and Smin such that Pmin + Smin = min(pi + si )
– Minimum of the error rate taking into account the variance
of the estimator.
• At example j : the error of the learning algorithm will be
– Out-control if pj + sj > pmin + α smin
– In-control if pj + sj < pmin + β smin
– Warning Level: if pmin + α smin > pj + sj > pmin + smin
• The constants α and β depend on the desired
confidence level.
– Admissible values are = 2 and = 3.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
80
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
81. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Statistical Processing Control
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
81
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
82. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Illustrative Example: SEA Concepts
• The stream:
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
82
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
83. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
The Individual Error of the Models
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
83
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
84. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
System Error
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
84
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
85. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Sequential Analysis
• Sequential analysis developed a set of
methods for monitoring change detection:
– Sequential Probability Ratio Test - SPRT algorithm,
D.Wald, Sequential Tests of Statistical
Hypotheses. Annals of Mathematical
Statistics 16 (2): 117–186; 1945
– Cumulative Sums - CUSUM: E. Page, Continuous
Inspection Scheme. Biometrika 41,1954
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
85
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
86. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
CUSUM Algorithms
The CUSUM test is used to detect significant increases (or
decreases) in the successive observations of a random variable x
• Detecting significant increases: • Detecting decreases:
– g0 = 0 – g0 = 0
– gt = max(0; gt-1+ (xt - α)) – gt = min(0; gt-1+ (xt - α))
• The decision rule is: • The decision rule is:
– if gt > λ then alarm and gt = 0. – if gt < λ then alarm and gt = 0.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
86
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
87. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Page-Hinckley Test
• The PH test is a sequential adaptation of the
detection of an abrupt change in the average
of a Gaussian signal.
– It considers a cumulative variable mT , defined as
the cumulated difference between the observed
values and their mean till the current moment:
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
87
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
88. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
The Page-Hinckley Test
• The minimum value of mt is also computed:
– MT = min(mt ; t = 1, …,T).
• The test monitors the difference between MT and
mT :
– PHT = mT - MT .
• When this difference is greater than a given
threshold (λ) we alarm a change in the
distribution.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
88
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
89. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Illustrative Example
– The left figure plots the on-line error rate of a learning
algorithm.
– The center plot is the accumulated on-line error. The slope
increases after the occurrence of a change.
– The right plot presents the evolution of the PH statistic.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
89
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
90. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Page-Hinckley Test
• The PHt test is memoryless, and its accuracy
depends on the choice of parameters α and λ.
– Both parameters are relevant to control the trade-
off between earlier detecting true alarms by
allowing some false alarms.
• The threshold λ depends on the admissible false
alarm rate.
– Increasing λ will entail fewer false alarms, but
might miss some changes.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
90
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
91. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Algorithm Adaptive Window-ADWIN
• Learning from Time-Changing Data with Adaptive Windowing,
A.Bifet, R.Gavaldà (SDM'07)
• Whenever two “large enough” subwindows of W exhibit
“distinct enough” averages,
– we can conclude that the corresponding expected values are different,
and
– the older portion of the window is dropped
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
91
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
92. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Algorithm ADWIN
• The value of εcut for a partition W0 . W1 of W is
computed as follows:
– Let n0 and n1 be the lengths of W0 and W1 and
– Let n be the length of W, so n = n0 + n1.
– Let ηW0 and ηW1 be the averages of the values in W0 and W1,
and W0 and W1 their expected values.
– To obtain totally rigorous performance guarantees we define:
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
92
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
93. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Characteristics of Algorithm ADWIN
• False positive rate bound:
– If ηt remains constant within W, the probability that
ADWIN shrinks the window at this step is at most δ.
• False negative rate bound.
– Suppose that for some partition of W in two parts
W0W1 (where W1 contains the most recent items) we
have:
|η W0 - η W1| > 2*εcut.
Then with probability 1 - δ ADWIN shrinks W to W1,
or shorter.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
93
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
94. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Illustrative Examples
Abrupt Change Gradual Change
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
94
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
95. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
ADWIN – Compressing Space
• Using exponential histograms
• Requires O(log W) space and time
• It can provide exact counts for all the subwindows
Example: Counting 1’s in a bit stream
Compress: Compress again:
A new 1 arrives: There are 3 buckets of size 1: There are 3 buckets of size 2:
Remove buckets when a drift is detect:
96. What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook
Summary
Dimensions of Analysis:
• Data management
• Characterize the information stored in memory to maintain
a decision model consistent with the actual state of the
nature
• Detection Methods
• Characterizes the techniques and mechanisms for explicit
drift detection
• Adaptation methods
• Characterizes the changes in the decision model do adapt to
the most recent examples.
• Decision model management
• Characterize the number of decision models needed to
maintain in memory.
PAKDD-2011 Tutorial, A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite
96
May 27, Shenzhen, China Handling Concept Drift: Importance, Challenges and Solutions
97. What
CD
iis
About
::
Types
of
Dri5s
&
Approaches
::
Techniques
iin
Detail
::
Evalua=on
:::
MOA
Demo
:::
Types
of
Applica=ons
:::
Outlook
What
CD
s
About
::
Types
of
Dri5s
& Approaches
::
Techniques
n
Detail
::
Evalua.on
:
MOA
Demo
:
Types
of
Applica=ons
:
Outlook
Block
3:
evalua=on
and
MOA
demo
PAKDD-‐2011
Tutorial,
A.
Bifet,
J.Gama,
M.
Pechenizkiy,
I.Zliobaite
98
May
27,
Shenzhen,
China
Handling
Concept
Dri5:
Importance,
Challenges
and
Solu=ons