SlideShare uma empresa Scribd logo
1 de 30
Online-Reinforcement-
Learning für
Adaptive Systeme
Andreas Metzger
Antrittsvorlesung im Rahmen
der Verleihung der Bezeichnung
außerplanmäßiger Professor
Essen, 26.04.2022
Agenda
1. Herausforderungen beim Engineering Adaptiver Systeme
2. Online-Reinforcement-Learning für Adaptive Systeme
3. Problem 1: Große Anzahl an Adaptionsmöglichkeiten
4. Problem 2: Nichtstationarität
5. Diskussion und Ausblick
Antrittsvorlesung apl Prof 2
Grundlagen
(Selbst-)Adaptives Software-System [Salehie & Tahvildari, 2009; Weyns, 2021]
• Beobachtet Veränderungen in Umgebung, Anforderungen und sich selbst
• Modifiziert seine Struktur, Parameter und Verhalten
Beispielhaftes Software-Lebenszyklusmodell [Metzger, 2021]
Antrittsvorlesung apl Prof 3
DEV OPS
self-observe
ADAPT
self-modify
Grundlagen
MAPE-K Referenzmodell [Kephart & Chess, 2003; Salehie & Tahvildari, 2009]
Beispiel: Adaptiver Web-Shop
• Monitor: Drastischer Anstieg der Nutzer (Workload)
• Analyze: Zu langsame Antwortzeit des Web-Shops
• Plan: Deaktivierung optionaler Empfehlungs-Funktionalität
• Execute: Ersetzen dynamischer Empfehlungen durch statischen Banner
Antrittsvorlesung apl Prof 4
Self-Adaptation Logic
Analyze
Monitor Execute
Plan
Knowledge
Ableiten konkreter
Anpassungen
Umsetzen der
Anpassungen
Feststellen der
Anpassungs-
notwendigkeit
Sammeln und
aggregieren von
Beobachtungsdaten
System Logic
Sensors Effectors
0
e
+
0
0
1
e
+
0
5
2
e
+
0
5
3
e
+
0
5
4
e
+
0
5
5
e
+
0
5
6
e
+
0
5
1
0
0
1
5
0
2
0
0
2
5
0
d
$
e
p
i
s
o
d
e
Workload
Zeit
Engineering Adaptiver Systeme
Herausforderung „Design Time Uncertainty“ [Weyns et al. 2013; Weyns, 2021]
Antizipation möglicher Umgebungssituationen
• Auf welche möglichen Umgebungszustände soll das adaptive System reagieren?
• Beispiel: Unterschiedliche Workloads des Web-Shops
Kenntnis der Auswirkungen von Adaptionen auf das System
• Welchen genauen Effekt hat welche Adaption in welcher Umgebungssituation?
• Welche Adaptation ist jeweils geeignet?
• Beispiel: Konkreter Effekt des Abschaltens der dynamischen Empfehlungen auf Antwortzeit?
Umgang mit Nicht-Stationarität („Concept Drift“)
• Welche Effekte welcher Adaptationen ändern sich über die Zeit?
• Beispiel: Cloud-Provider migriert auf leistungsstärkere Rechner
 Anpassung des Web-Shops hat andere Auswirkung auf die Antwortzeit als vor der Migration
Antrittsvorlesung apl Prof 5
Agenda
1. Herausforderungen beim Engineering Adaptiver Systeme
2. Online-Reinforcement-Learning für Adaptive Systeme
3. Problem 1: Große Anzahl an Adaptionsmöglichkeiten
4. Problem 2: Nichtstationarität
5. Diskussion und Ausblick
Antrittsvorlesung apl Prof 6
Online Reinforcement Learning
Online-Reinforcement-Learning
Lösungsansatz für „Design Time Uncertainty“
[Xu et al. 2012; Jamshidi et al. 2015; Arabnejad et al., 2017; Wang et al. 2020]
• Einsatz von Reinforcement Learning zur Laufzeit
• Lernen auf Basis konkreter Beobachtungen (Daten, Feedback)
Antrittsvorlesung apl Prof 7
Self-Adaptation Logic
Analyze
Monitor Execute
Plan
Knowledge
System Logic
Sensors Effectors
Learn
Feedback Update
Reinforcement Learning (RL)
Grundlegendes „Modell“
[Sutton & Barto, 2018]
Ziel von RL: Maximierung des
kumulativen Rewards
basierend auf [Sutton & Barto, 2018]
Action A
State S
Reward R
Action
Selection
Next state S’
Agent
Policy
Policy Update
Environment
Antrittsvorlesung apl Prof 8
Standard-Beispiel: „Cliff Walk“
Actions = {UP, DOWN,
LEFT, RIGHT}
Reward
[Sutton & Barto, 2018]
States:
Policy
Basis-Repräsentation
Action-Value Function Q(S, A) = Erwarteter kumulativer Reward von A in S
9
Antrittsvorlesung apl Prof
UP RIGHT DOWN LEFT
0 -11,024348 -10,993611 -11,139276 -10,895849
1 -10,468766 -10,545294 -10,487413 -10,768467
2 -10,124603 -9,9857087 -10,127682 -9,9800919
3 -9,2182989 -9,2436838 -9,2244595 -9,9700161
4 -8,6663503 -8,4674264 -8,5469998 -8,9387368
5 -7,5970854 -7,6277813 -8,0979207 -8,0828906
6 -6,9876845 -6,8140858 -6,9729081 -7,1386728
7 -6,2359939 -6,0596636 -6,0000182 -6,2320178
8 -5,2610507 -5,2221814 -5,501208 -5,6035276
9 -4,41 -4,4164376 -4,5783507 -4,8609271
10 -3,9 -3,7068078 -3,6272011 -4,0513057
11 -3,1389 -2,973 -2,9372454 -3,3706948
12 -11,430541 -11,289191 -11,775295 -11,584373
13 -10,846032 -10,640915 -11,081829 -11,088906
14 -10,040774 -9,9527793 -10,156681 -10,349203
15 -9,2281034 -9,174737 -9,1638677 -9,7142622
16 -8,4058004 -8,4137873 -8,4416937 -8,6208545
17 -7,8824798 -7,6137387 -7,6974977 -7,9262898
18 -6,7553715 -6,6965786 -6,770711 -7,6071024
19 -6,0319487 -5,8025357 -5,8487016 -6,0458441
20 -5,2044891 -4,8745608 -4,8986798 -5,4393198
21 -3,9508877 -3,9380133 -3,9360373 -4,4595554
22 -3,1293615 -2,9790466 -2,9814757 -3,556979
23 -2,7588749 -2,1 -1,9997082 -2,2345458
24 -12,046042 -11,980541 -12,386135 -12,400755
25 -11,254962 -10,990933 -71,713043 -12,328903
26 -10,305163 -9,9967061 -83,024489 -10,591003
27 -9,2933022 -8,9987946 -94,610418 -10,091768
28 -8,6139253 -7,9997238 -82,098353 -8,2078643
29 -7,1185938 -6,999952 -89,767881 -7,0752485
30 -7,1699786 -5,9999941 -94,74073 -6,778584
31 -5,010696 -4,9999996 -70,811303 -6,1628398
32 -4,5310065 -4 -70,913884 -4,9614787
33 -3,1233908 -3 -54,068698 -3,8506582
34 -3,183964 -2 -69,622011 -2,9916461
35 -1,9729214 -1,4397606 -1 -2,1259949
36 -12,957362 -103,64185 -12,96273 -13,313715
37 0 0 0 0
38 0 0 0 0
39 0 0 0 0
40 0 0 0 0
41 0 0 0 0
States
UP RIGHT DOWN LEFT
-11,024348 -11,198753 -11,139276 -11,195849
-10,768766 -10,793123 -11,193284 -10,768467
-10,124603 -10,161121 -10,127682 -10,426694
-9,5182989 -9,4442629 -9,5303384 -9,9700161
-8,6663503 -8,6299613 -8,8700016 -8,9387368
-7,8970854 -7,8095155 -8,0979207 -8,0828906
-6,9876845 -6,9687636 -7,1900092 -7,6021728
-6,2359939 -6,1542085 -6,2601776 -6,2320178
-5,5610507 -5,3593116 -5,501208 -5,6035276
-4,71 -4,6035114 -4,5783507 -4,8609271
-3,9 -3,7759391 -3,8125044 -4,0513057
-3,1389 -2,973 -2,9560418 -3,3706948
-11,570133 -11,496638 -11,775295 -11,584373
-11,242293 -11,116777 -11,081829 -11,088906
-10,322569 -10,313653 -10,409676 -10,861185
-9,7643562 -9,5445051 -9,5902708 -9,7142622
-8,7242882 -8,7147207 -8,7264299 -9,1798613
-7,8824798 -7,8123144 -7,7882484 -7,9262898
-7,0729858 -6,8940886 -6,8876484 -7,6071024
-6,0319487 -5,9355019 -5,9481047 -6,0458441
-5,2044891 -4,9594251 -4,9652472 -5,4393198
-4,3886214 -3,9855105 -3,9780608 -4,4595554
-3,1293615 -2,9974875 -2,9978206 -3,556979
-2,7588749 -2,1 -1,9999882 -2,2345458
-12,270355 -12 -13,209206 -12,85595
-11,727159 -11 -103,08547 -12,669874
-10,735764 -10 -98,304173 -11,515636
-10,095276 -9 -109,90926 -10,364232
-9,2807075 -8 -107,80543 -9,1218535
-7,5671372 -7 -96,737516 -7,652674
-7,604409 -6 -100,21851 -7,1450088
-6,2486058 -5 -98,528948 -6,5897915
-5,2325737 -4 -92,377803 -5,4911246
-4,0664859 -3 -92,786447 -3,8506582
-3,7995255 -2 -98,12125 -2,9916461
-2,4966739 -1,7254827 -1 -2,7901514
-13 -109,79015 -13,938227 -13,662917
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
UP RIGHT DOWN LEFT
-12,224348 -12,044198 -12,463232 -12,349292
-11,368766 -11,407281 -11,567699 -11,741966
-10,724603 -10,758294 -10,878073 -10,956671
-10,118299 -9,9104485 -10,209248 -9,9700161
-9,2663503 -9,0282991 -9,4003475 -9,4925009
-8,1970854 -8,145322 -8,3578677 -8,6139991
-7,5583819 -7,3056764 -7,4218645 -7,6021728
-6,5359939 -6,4629871 -6,6290838 -6,800472
-5,8610507 -5,6457718 -5,6477581 -6,1414037
-5,01 -4,8068304 -4,7890915 -4,8609271
-3,9 -3,902123 -3,9078592 -4,0513057
-3,1389 -3,273 -2,9849214 -3,3706948
-12,609306 -12,612216 -12,579926 -12,784373
-11,845763 -11,794108 -11,845683 -12,478945
-11,237922 -10,873018 -10,900784 -11,436694
-10,008564 -9,9425947 -9,9517958 -10,268937
-9,2581936 -8,9864275 -8,9889605 -9,1798613
-8,1866584 -7,9931344 -7,9940185 -8,9510004
-7,0729858 -6,9970914 -6,9977784 -8,2920288
-6,0319487 -5,9988816 -5,9989739 -6,6224484
-6,0996963 -4,9994285 -4,999519 -6,232246
-4,8057376 -3,9997425 -3,9997874 -4,9212681
-3,1293615 -2,9999285 -2,9999384 -3,556979
-2,7588749 -2,1 -2 -2,2345458
-13,360876 -12 -13,954412 -12,991696
-12,565624 -11 -112,1835 -12,995431
-11,733772 -10 -112,79659 -11,980454
-10,740429 -9 -111,48554 -10,947642
-9,95339 -8 -112,12695 -9,9878453
-8,9112 -7 -112,67844 -8,8890419
-7,992178 -6 -112,91331 -7,965498
-6,9846114 -5 -112,41604 -6,9763523
-5,9533325 -4 -111,81117 -5,9401313
-4,9217978 -3 -110,6219 -4,8068301
-3,9307738 -2 -112,85584 -3,9418704
-2,9796888 -1,9340884 -1 -2,9827181
-13 -112,90933 -13,998779 -13,995334
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
UP RIGHT DOWN LEFT
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 0 0 0 0
11 0 0 0 0
12 0 0 0 0
13 0 0 0 0
14 0 0 0 0
15 0 0 0 0
16 0 0 0 0
17 0 0 0 0
18 0 0 0 0
19 0 0 0 0
20 0 0 0 0
21 0 0 0 0
22 0 0 0 0
23 0 0 0 0
24 0 0 0 0
25 0 0 0 0
26 0 0 0 0
27 0 0 0 0
28 0 0 0 0
29 0 0 0 0
30 0 0 0 0
31 0 0 0 0
32 0 0 0 0
33 0 0 0 0
34 0 0 0 0
35 0 0 0 0
36 0 0 0 0
37 0 0 0 0
38 0 0 0 0
39 0 0 0 0
40 0 0 0 0
41 0 0 0 0
Actions
States
„Cliff Walk“-Beispiel:
State
S
Action A
Action Selection
Prinzipien
• Exploration = Akkumulation von neuem Wissen
• Exploitation = Nutzung existierenden Wissens
Exploitation-Exploration Tradeoff
• Pro Lernschritt: entweder Exploitation oder Exploration
• Exploitation maximiert Reward in dem einen Schritt
• Exploration maximiert (langfristig) kumulativen Reward
Standardverfahren
• -greedy: Mit Wahrscheinlichkeit
• -decay: Schrittweise Reduktion
von  zur Konvergenz des Lernprozesses
10
Antrittsvorlesung apl Prof
: Exploration: Wahl einer
zufälligen Aktion
(1- ): Exploitation: Wahl der laut Q
besten Aktion = Greedy Action
Policy Update: Basisalgorithmen
Q-Learning: „off-policy“
• Aktualisierung ohne Berücksichtigung
der bereits gelernten Policy
SARSA: „on-policy“
• Aktualisierung unter Kenntnis
der bereits gelernten Policy
11
Antrittsvorlesung apl Prof
SARSA
Q-Learning
Hyperparameter 
„Discount Factor“
Hyperparameter 
„Learning Rate“
Online-RL für Adaptive Systeme
Kombination von MAPE-K und RL [Palm et al. 2020; Metzger et al. 2022]
Self-Adaptation Logic
Analyze
Monitor Execute
Plan
Knowledge
Action A
State S
Reward R
Action
Selection
Next state S’
Agent
Policy
Policy Update
Environment
Self-Adaptation Logic
Realized via Reinforcement Learning
Execute
Policy
(Knowledge)
Monitor
Action
Selection
(Analyze + Plan)
Policy Update
Action
A
State S
Reward R
Next state S’
Antrittsvorlesung apl Prof 12
Action = Adaptionsentscheidung
Reward = Wie gut war die jeweilige
Adaptionsentscheidung?
Agenda
1. Herausforderungen beim Engineering Adaptiver Systeme
2. Online-Reinforcement-Learning für Adaptive Systeme
3. Problem 1: Große Anzahl an Adaptionsmöglichkeiten
4. Problem 2: Nichtstationarität
5. Diskussion und Ausblick
Antrittsvorlesung apl Prof 13
Problem beim Einsatz von Online-RL
Exploration großer Anzahl diskreter Adaptionsmöglichkeiten
• Beispiel: Service-orientiertes System
• 8 abstrakte Services mit je 2 konkreten Services
• 256 diskrete Adaptionsmöglichkeiten
State of the Art bei adaptiven Systemen (z.B. [Xu et al. 2012; Jamshidi et al.
2015; Arabnejad et al., 2017; Wang et al. 2020])
• Nutzung von -greedy für Exploration-Exploitation-Tradeoff
• Exploration erfolgt zufällig
 Langsames Lernen bei großer Anzahl
Adaptationsmöglichkeiten
(siehe auch z.B. [Filho & Porter, 2017; Dulac-Arnold et al., 2015])
14
Antrittsvorlesung apl Prof
Lösungsansatz
Feature-Modell-geführte
Lernstrategien für
systematische Exploration
[Metzger et al., 2020a; Metzger et al., 2022]
Explizite Modellierung
der Adaptionsmöglichkeiten
in einem Feature-Modell
aus der Software-
Produktlinienentwicklung
[Metzger & Pohl, 2004]
Exploration unter
Nutzung der Struktur des
Feature-Modells
Antrittsvorlesung apl Prof 15
Self-Adaptation Logic
Realized via Reinforcement Learning
Execute
Policy
(Knowledge)
Monitor
Action
Selection
(Analyze + Plan)
Policy Update
Action
a
State s
Reward r
Next state s’
Feature-Modell
Feature-Modelle zur Spezifikation der
Adaptionsmöglichkeiten
Web Shop
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation

  

Web Shop
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation

 

Nbr of Concurrent Users  1000  Adaptation
Mandatory
Optional
Alternative
 Activated
• FM = Kompakte Spezifikation zulässiger System-Konfigurationen
• Konkrete System-Konfiguration = Kombination aktivierter Features
• Adaptation = Änderung der konkreten System-Konfiguration zur Laufzeit
Recommendation
 Max  Medium
Recommendation
 Max  Medium

Antrittsvorlesung apl Prof 16
Beispiel:
Feature-Modell (FM)
eines Web-Shops
FM-geführte Exploration Web Shop
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation
State of the Art: -greedy
FM-geführt: FM-structure
2. Exploration der Konfigurationen mit
diesem Blatt-Feature…
3. …dann erst Exploration der
Konfigurationen mit dem “Geschwister”-
Feature
1. Beginn bei zufällig selektiertem
Blatt-Feature
Recommendation
 Max  Medium
Antrittsvorlesung apl Prof 17
Validierung
Systeme
Messung der Lern-Performanz
• 500 Wiederholungen wg.
stochastischen Effekten
• “Reward”-Metriken nach
[Taylor & Stone, 2009]
Antrittsvorlesung apl Prof 18
Zeitschritt
Reward
Asymptotic
Performance
Time to Threshold
(hier: 90% max-min Performance)
Total Performance
CloudRM [Mann, 2016]
BerkeleyDB-J [Siegmund et al. 2012]
Features 63 26
Anzahl Adaptionen 344 180
Tiefe des Feature-Modells 3 5
Initial Performance
Validierung
Antrittsvorlesung apl Prof 19
Ergebnisse
Effekt der FM-Charakteristika
• Höhere Verbesserung für
CloudRM, da deutlich
größere Anzahl an
Adaptationsmöglichkeiten
Effekt des Lernalgorithmus
• Höhere Verbesserungen bei SARSA
• Aber: Absolute Lern-Performanz
von SARSA << Q-Learning
• Grund: SARSA vermeidet riskante
Adaptationen (vgl. „safe path“ bei
Cliff Walk)
 langsameres Lernen
Verbesserung ggü. E-greedy Durchschnittlich Q-Learning SARSA
Asymptotic Performance 0,3% -0,4% 1,1%
Time to Threshold 25,4% 15,1% 35,8%
Total Performance 33,7% 24,2% 43,2%
SARSA vs. Q-
Learning (absolut)
-3.8%
-27.6%
-23.0%
Agenda
1. Herausforderungen beim Engineering Adaptiver Systeme
2. Online-Reinforcement-Learning für Adaptive Systeme
3. Problem 1: Große Anzahl an Adaptionsmöglichkeiten
4. Problem 2: Nichtstationarität
5. Diskussion und Ausblick
Antrittsvorlesung apl Prof 20
Problem beim Einsatz von Online-RL
Exploration vs Exploitation bei Nicht-Stationarität
• Beispiel: Cloud-Anwendung
• Änderung der CPU-Leistung der Cloud-Hardware über die Zeit
• Effekt auf Performance der Cloud-Anwendung
State of the Art bei adaptiven Systemen (z.B. [Xu et al. 2012; Jamshidi et
al. 2015; Arabnejad et al., 2017; Wang et al. 2020])
• Nutzung von -decay für Konvergenz des Lernprozesses
• Wenn  klein, zu wenig Exploration um Nicht-Stationarität zu erfassen
 Erfordert Feststellung von Nicht-Stationarität und geeignete
Erhöhung von -decay zur Laufzeit
21
Antrittsvorlesung apl Prof
Grundidee
Deep RL für automatische Anpassung der Exploration
[Palm et al. 2020; Metzger et al. 2020b]
Knowledge: Neuronales Netz statt Action-Value-Function Q
• Generalisierung über bisher nicht-beobachtete Zustände
Action Selection: Stochastisches Sampling
• Keine Notwendigkeit des „Tunings“
Exploration  Exploitation
• Automatische Anpassung insbesondere
bei Nichtstationarität
Policy Update: Gradientenverfahren
• Typischer Ansatz zum Lernen
der Gewichte des Neuronalen Netzes
Antrittsvorlesung apl Prof 22
Self-Adaptation Logic
Realized via Reinforcement Learning
Execute
Policy
Monitor
Action
Selection
Policy Update
Adaptation
Action
a
State s
Reward r
Next state s’
(Sampling)
(Gradienten-
Verfahren)
(Neuro-
nales
Netz)
Validierung
Systeme
Brownout-RUBiS: Adaptiver Web-Shop [Klein et al. 2014]
PrBPM: Proaktives Geschäftsprozess-Monitoring-System [Metzger et al. 2019]
Antrittsvorlesung apl Prof 23
Validierung
Antrittsvorlesung apl Prof 24
-0,2
-0,1
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
100
1101
2102
3103
4104
5105
6106
7107
8108
9109
10110
11111
12112
13113
14114
15115
16116
17117
18118
19119
20120
21121
22122
23123
24124
25125
26126
27127
28128
29129
30130
31131
32132
33133
34134
35135
36136
37137
38138
39139
40140
41141
42142
43143
44144
45145
Rate der Adaptions-
entscheidungen
Frühzeitigkeit
Reward
Rate richtiger
Adaptions-
entscheidungen
Brownout-RUBiS
PrBPM
Dimmer-Value
Reward
Antwortzeit
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
98
1099
2100
3101
4102
5103
6104
7105
8106
9107
10108
11109
12110
13111
14112
15113
16114
17115
18116
19117
20118
21119
22120
23121
24122
25123
26124
27125
28126
29127
30128
31129
32130
33131
34132
35133
36134
37135
38136
39137
40138
41139
42140
43141
44142
45143
46144
Accuracy (avg MAE; last 100 cases)
Accur a
cy (
avg MAE; l
ast 100 ca
ses)
Prognosegenauigkeit
Workload
100 %
50%
CPU-Leistung
Automatische Handhabung
von Nicht-Stationarität
Agenda
1. Herausforderungen beim Engineering Adaptiver Systeme
2. Online-Reinforcement-Learning für Adaptive Systeme
3. Problem 1: Große Anzahl an Adaptionsmöglichkeiten
4. Problem 2: Nichtstationarität
5. Diskussion und Ausblick
Antrittsvorlesung apl Prof 25
Diskussion und Ausblick
Online-RL nicht für alle Systemtypen geeignet
• Riskant, wenn “falsche” Adaptionen Schaden verursachen
 Safe Reinforcement Learning zur sicheren Exploration
• Manipulierbar durch „adversarial“ Input aus der Umgebung (“gefälschte” Beobachtungen)
 Adversarial Machine Learning um Robustheit ggü. Angriffen zu erhöhen
Geringe initiale Performanz von Online-RL
• Selbst einfache/bekannte Zusammenhänge müssen zu Beginn gelernt werden
 Meta-RL zur Wiederverwendung von in „verwandten“ Umgebungen gelerntem Wissen
Reward-Engineering Problem von RL allgemein
• Richtige Formulierung der Reward-Funktion essentiell für „Lernerfolg“
• Nicht transparent was RL lernt (besonders bei Deep RL)
 Explainable Machine Learning für das „Debugging“ der Reward-Funktion
26
Antrittsvorlesung apl Prof
Danke!
Research leading to these results has received funding from the EU’s Horizon 2020 research and
innovation programme under grant agreements no.
780351 – www.enact-project.eu
731932 – www.transformingtransport.eu
871493 – www.dataports-project.eu
Grundlagenliteratur
• D. Weyns, Introduction to Self-Adaptive Systems: A Contemporary Software Engineering Perspective, Wiley, 2021
• R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018
Weiterführende Literatur
Exploration großer Anzahl Adaptionsmöglichkeiten
• A. Metzger, C. Quinton, Z. Á. Mann, L. Baresi, K. Pohl, “Realizing Self-Adaptive Systems via Online Reinforcement
Learning and Feature-Model-guided Exploration”, Computing, Springer, March, 2022
• A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature model-guided online reinforcement learning for
self-adaptive services,” in 18th Int’l Conf. on Service-Oriented Computing (ICSOC 2020), LNCS 12571, Springer, 2020
Exploration vs Exploitation bei Nicht-Stationarität
• A. Palm, A. Metzger, and K. Pohl, “Online reinforcement learning for self-adaptive information systems,” in 32nd
Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2020), LNCS 12127. Springer, 2020
• A. Metzger, T. Kley, and A. Palm, “Triggering proactive business process adaptations via online reinforcement
learning,” in 18th Int’l Conf. on Business Process Management (BPM 2020), LNCS 12168. Springer, 2020
Antrittsvorlesung apl Prof 27
Referenzen
[Arabnejad et al., 2017] H. Arabnejad, C. Pahl, P. Jamshidi, and G. Estrada, “A comparison of reinforcement learning techniques for
fuzzy cloud autoscaling,” in 17th Intl Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017
[De Lemos et al. 2010] R. de Lemos et al., “Software Engineering for Self-Adaptive Systems: A Second Research Roadmap,” in Softw.
Eng. for Self-Adaptive Systems II, ser. LNCS. Springer, 2013, vol. 7475, pp. 1–32
[Di Francescomarino et al. 2018] Chiara Di Francescomarino, Chiara Ghidini, Fabrizio Maria Maggi, Fredrik Milani: Predictive Process
Monitoring Methods: Which One Suits Me Best? BPM 2018: 462-479
[Dulac-Arnold et al. 2015] Gabriel Dulac-Arnold, Richard Evans, Peter Sunehag, Ben Coppin: Reinforcement Learning in Large Discrete
Action Spaces. CoRR abs/1512.07679 (2015)
[Evermann et al. 2017] Evermann, J., Rehse, J., Fettke, P.: Predicting process behaviour using deep learning. Decision Support Systems
100, 2017
[Filho & Porter, 2017] Filho, R.V.R., Porter, B.: Defining emergent software using continuous self-assembly, perception, and learning.
TAAS 12(3), 16:1–16:25 (2017)
[Jamshidi et al., 2015] P. Jamshidi, A. Molzam Sharifloo, C. Pahl, A. Metzger, and G. Estrada, “Self-learning cloud controllers: Fuzzy Q-
learning for knowledge evolution (short paper),” in Int’l Conference on Cloud and Autonomic Computing (IC- CAC 2015) Cambridge,
USA, September 21-24, 2015,
[Kephart & Chess, 2003] J. O. Kephart and D. M. Chess, “The vision of autonomic computing,” IEEE Computer, vol. 36, no. 1, pp. 41–50,
2003.
[Klein et al. 2014] C. Klein, M. Maggio, K. Arzen, F. Hernandez-Rodriguez, “Brownout: building more robust cloud applications”. In:
36th Intl Conf. on Software Engineering (ICSE 2014), pp. 700–711. ACM, 2014
[Mann, 2016] Z. Mann, “Interplay of virtual machine selection and virtual machine placement”, in: 5th European Conf. on Service-
Oriented and Cloud Computing, ESOCC’16, LNCS vol. 9846, pp. 137–151 (2016)
[Metzger & Pohl, 2014] A. Metzger, K. Pohl, “Software product line engineering and variability management: Achievements and
challenges,” in ICSE Future of Software Engineering Track (FOSE 2014), ACM, 2014, pp. 70–84.
28
Referenzen
[Metzger et al. 2019] A. Metzger, A. Neubauer, P. Bohn, and K. Pohl, “Proactive process adaptation using deep learning ensembles,” in
31st Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2019), LNCS, vol. 11483. Springer, 2019, pp. 547–562
[Metzger et al. 2020] A. Metzger, C. Quinton, Z. Á. Mann, L. Baresi, K. Pohl, “Realizing Self-Adaptive Systems via Online Reinforcement
Learning and Feature-Model-guided Exploration”, Computing, Springer, March, 2022
[Metzger et al. 2020a] A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature model-guided online reinforcement learning
for self-adaptive services,” in 18th Int’l Conf. on Service-Oriented Computing (ICSOC 2020), LNCS 12571, Springer, 2020
[Metzger et al. 2020b] A. Metzger, T. Kley, and A. Palm, “Triggering proactive business process adaptations via online reinforcement
learning,” in 18th Int’l Conf. on Business Process Management (BPM 2020), LNCS 12168. Springer, 2020e
[Metzger, 2021] Workshop on Software in Electronics, Components and Systems-based Digitisation, Virtual, May, 2021, “Software
Engineering for ECS: Towards Dev-Ops-Adapt” (presentation slides)
[Palm et al. 2020] A. Palm, A. Metzger, and K. Pohl, “Online reinforcement learning for self-adaptive information systems,” in 32nd Int’l
Conf. on Advanced Information Systems Engineering (CAiSE 2020), LNCS 12127. Springer, 2020
[Salehie & Tahvildari, 2009] M. Salehie and L. Tahvildari, “Self-adaptive software: Landscape and research challenges,” TAAS, vol. 4, no.
2, 2009.
29
Referenzen
[Siegmund et al. 2012] N. Siegmund, S. Kolesnikov, C. Kästner, S. Apel, D. Batory, M. Rosenmüller, G. Saake, G.: Predicting Performance
via Automated Feature-interaction Detection. In: 34th Intl Conf. on Software Engineering (ICSE 2012), pp. 167–177, ACM, 2012
[Sutton & Barto, 2018] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA, USA: MIT Press,
2018
[Taylor & Stone, 2009] M. Taylor, P. Stone: Transfer learning for reinforcement learning domains: A survey. J. Mach. Learn. Res. 10,
1633–1685 (2009)
[Wang et al., 2020] Hongbing Wang, Jiajie Li, Qi Yu, Tianjing Hong, Jia Yan, Wei Zhao: Integrating recurrent neural networks and
reinforcement learning for dynamic service composition. Future Gener. Comput. Syst. 107: 551-563 (2020)
[Weyns et al. 2013] Danny Weyns, Nelly Bencomo, Radu Calinescu, Javier Cámara, Carlo Ghezzi, Vincenzo Grassi, Lars Grunske, Paola
Inverardi, Jean-Marc Jézéquel, Sam Malek, Raffaela Mirandola, Marco Mori, Giordano Tamburrelli: Perpetual Assurances for Self-
Adaptive Systems. Software Engineering for Self-Adaptive Systems 2013: 31-63
[Weyns, 2021] Danny Weyns, Introduction to Self-Adaptive Systems: A Contemporary Software Engineering Perspective, Wiley, 2021.
[Xu et al., 2012] C. Xu, J. Rao, and X. Bu, “URL: A unified reinforcement learning approach for autonomic cloud management,” J.
Parallel Distrib. Comput., vol. 72, no. 2, pp. 95–105, 2012
30

Mais conteúdo relacionado

Mais de Andreas Metzger

Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems
Explaining Online Reinforcement Learning Decisions of Self-Adaptive SystemsExplaining Online Reinforcement Learning Decisions of Self-Adaptive Systems
Explaining Online Reinforcement Learning Decisions of Self-Adaptive SystemsAndreas Metzger
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesAndreas Metzger
 
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Andreas Metzger
 
Data-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsData-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsAndreas Metzger
 
Data-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process ManagementData-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process Management Andreas Metzger
 
Big Data Technology Insights
Big Data Technology InsightsBig Data Technology Insights
Big Data Technology InsightsAndreas Metzger
 
Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles Andreas Metzger
 
Data-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information SystemsData-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information SystemsAndreas Metzger
 
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud Andreas Metzger
 
Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...Andreas Metzger
 
Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics Andreas Metzger
 
Predictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and RiskPredictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and RiskAndreas Metzger
 
Risk-based Proactive Process Adaptation
Risk-based Proactive Process AdaptationRisk-based Proactive Process Adaptation
Risk-based Proactive Process AdaptationAndreas Metzger
 
Predictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability EstimatesPredictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability EstimatesAndreas Metzger
 

Mais de Andreas Metzger (14)

Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems
Explaining Online Reinforcement Learning Decisions of Self-Adaptive SystemsExplaining Online Reinforcement Learning Decisions of Self-Adaptive Systems
Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
 
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
Triggering Proactive Business Process Adaptations via Online Reinforcement Le...
 
Data-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software SystemsData-driven AI for Self-Adaptive Software Systems
Data-driven AI for Self-Adaptive Software Systems
 
Data-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process ManagementData-driven Deep Learning for Proactive Terminal Process Management
Data-driven Deep Learning for Proactive Terminal Process Management
 
Big Data Technology Insights
Big Data Technology InsightsBig Data Technology Insights
Big Data Technology Insights
 
Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles Proactive Process Adaptation using Deep Learning Ensembles
Proactive Process Adaptation using Deep Learning Ensembles
 
Data-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information SystemsData-driven AI for Self-adaptive Information Systems
Data-driven AI for Self-adaptive Information Systems
 
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
 
Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...Considering Non-sequential Control Flows for Process Prediction with Recurren...
Considering Non-sequential Control Flows for Process Prediction with Recurren...
 
Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics Big Data Value in Mobility and Logistics
Big Data Value in Mobility and Logistics
 
Predictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and RiskPredictive Business Process Monitoring considering Reliability and Risk
Predictive Business Process Monitoring considering Reliability and Risk
 
Risk-based Proactive Process Adaptation
Risk-based Proactive Process AdaptationRisk-based Proactive Process Adaptation
Risk-based Proactive Process Adaptation
 
Predictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability EstimatesPredictive Process Monitoring Considering Reliability Estimates
Predictive Process Monitoring Considering Reliability Estimates
 

Antrittsvorlesung - APL.pptx

  • 1. Online-Reinforcement- Learning für Adaptive Systeme Andreas Metzger Antrittsvorlesung im Rahmen der Verleihung der Bezeichnung außerplanmäßiger Professor Essen, 26.04.2022
  • 2. Agenda 1. Herausforderungen beim Engineering Adaptiver Systeme 2. Online-Reinforcement-Learning für Adaptive Systeme 3. Problem 1: Große Anzahl an Adaptionsmöglichkeiten 4. Problem 2: Nichtstationarität 5. Diskussion und Ausblick Antrittsvorlesung apl Prof 2
  • 3. Grundlagen (Selbst-)Adaptives Software-System [Salehie & Tahvildari, 2009; Weyns, 2021] • Beobachtet Veränderungen in Umgebung, Anforderungen und sich selbst • Modifiziert seine Struktur, Parameter und Verhalten Beispielhaftes Software-Lebenszyklusmodell [Metzger, 2021] Antrittsvorlesung apl Prof 3 DEV OPS self-observe ADAPT self-modify
  • 4. Grundlagen MAPE-K Referenzmodell [Kephart & Chess, 2003; Salehie & Tahvildari, 2009] Beispiel: Adaptiver Web-Shop • Monitor: Drastischer Anstieg der Nutzer (Workload) • Analyze: Zu langsame Antwortzeit des Web-Shops • Plan: Deaktivierung optionaler Empfehlungs-Funktionalität • Execute: Ersetzen dynamischer Empfehlungen durch statischen Banner Antrittsvorlesung apl Prof 4 Self-Adaptation Logic Analyze Monitor Execute Plan Knowledge Ableiten konkreter Anpassungen Umsetzen der Anpassungen Feststellen der Anpassungs- notwendigkeit Sammeln und aggregieren von Beobachtungsdaten System Logic Sensors Effectors 0 e + 0 0 1 e + 0 5 2 e + 0 5 3 e + 0 5 4 e + 0 5 5 e + 0 5 6 e + 0 5 1 0 0 1 5 0 2 0 0 2 5 0 d $ e p i s o d e Workload Zeit
  • 5. Engineering Adaptiver Systeme Herausforderung „Design Time Uncertainty“ [Weyns et al. 2013; Weyns, 2021] Antizipation möglicher Umgebungssituationen • Auf welche möglichen Umgebungszustände soll das adaptive System reagieren? • Beispiel: Unterschiedliche Workloads des Web-Shops Kenntnis der Auswirkungen von Adaptionen auf das System • Welchen genauen Effekt hat welche Adaption in welcher Umgebungssituation? • Welche Adaptation ist jeweils geeignet? • Beispiel: Konkreter Effekt des Abschaltens der dynamischen Empfehlungen auf Antwortzeit? Umgang mit Nicht-Stationarität („Concept Drift“) • Welche Effekte welcher Adaptationen ändern sich über die Zeit? • Beispiel: Cloud-Provider migriert auf leistungsstärkere Rechner  Anpassung des Web-Shops hat andere Auswirkung auf die Antwortzeit als vor der Migration Antrittsvorlesung apl Prof 5
  • 6. Agenda 1. Herausforderungen beim Engineering Adaptiver Systeme 2. Online-Reinforcement-Learning für Adaptive Systeme 3. Problem 1: Große Anzahl an Adaptionsmöglichkeiten 4. Problem 2: Nichtstationarität 5. Diskussion und Ausblick Antrittsvorlesung apl Prof 6
  • 7. Online Reinforcement Learning Online-Reinforcement-Learning Lösungsansatz für „Design Time Uncertainty“ [Xu et al. 2012; Jamshidi et al. 2015; Arabnejad et al., 2017; Wang et al. 2020] • Einsatz von Reinforcement Learning zur Laufzeit • Lernen auf Basis konkreter Beobachtungen (Daten, Feedback) Antrittsvorlesung apl Prof 7 Self-Adaptation Logic Analyze Monitor Execute Plan Knowledge System Logic Sensors Effectors Learn Feedback Update
  • 8. Reinforcement Learning (RL) Grundlegendes „Modell“ [Sutton & Barto, 2018] Ziel von RL: Maximierung des kumulativen Rewards basierend auf [Sutton & Barto, 2018] Action A State S Reward R Action Selection Next state S’ Agent Policy Policy Update Environment Antrittsvorlesung apl Prof 8 Standard-Beispiel: „Cliff Walk“ Actions = {UP, DOWN, LEFT, RIGHT} Reward [Sutton & Barto, 2018] States:
  • 9. Policy Basis-Repräsentation Action-Value Function Q(S, A) = Erwarteter kumulativer Reward von A in S 9 Antrittsvorlesung apl Prof UP RIGHT DOWN LEFT 0 -11,024348 -10,993611 -11,139276 -10,895849 1 -10,468766 -10,545294 -10,487413 -10,768467 2 -10,124603 -9,9857087 -10,127682 -9,9800919 3 -9,2182989 -9,2436838 -9,2244595 -9,9700161 4 -8,6663503 -8,4674264 -8,5469998 -8,9387368 5 -7,5970854 -7,6277813 -8,0979207 -8,0828906 6 -6,9876845 -6,8140858 -6,9729081 -7,1386728 7 -6,2359939 -6,0596636 -6,0000182 -6,2320178 8 -5,2610507 -5,2221814 -5,501208 -5,6035276 9 -4,41 -4,4164376 -4,5783507 -4,8609271 10 -3,9 -3,7068078 -3,6272011 -4,0513057 11 -3,1389 -2,973 -2,9372454 -3,3706948 12 -11,430541 -11,289191 -11,775295 -11,584373 13 -10,846032 -10,640915 -11,081829 -11,088906 14 -10,040774 -9,9527793 -10,156681 -10,349203 15 -9,2281034 -9,174737 -9,1638677 -9,7142622 16 -8,4058004 -8,4137873 -8,4416937 -8,6208545 17 -7,8824798 -7,6137387 -7,6974977 -7,9262898 18 -6,7553715 -6,6965786 -6,770711 -7,6071024 19 -6,0319487 -5,8025357 -5,8487016 -6,0458441 20 -5,2044891 -4,8745608 -4,8986798 -5,4393198 21 -3,9508877 -3,9380133 -3,9360373 -4,4595554 22 -3,1293615 -2,9790466 -2,9814757 -3,556979 23 -2,7588749 -2,1 -1,9997082 -2,2345458 24 -12,046042 -11,980541 -12,386135 -12,400755 25 -11,254962 -10,990933 -71,713043 -12,328903 26 -10,305163 -9,9967061 -83,024489 -10,591003 27 -9,2933022 -8,9987946 -94,610418 -10,091768 28 -8,6139253 -7,9997238 -82,098353 -8,2078643 29 -7,1185938 -6,999952 -89,767881 -7,0752485 30 -7,1699786 -5,9999941 -94,74073 -6,778584 31 -5,010696 -4,9999996 -70,811303 -6,1628398 32 -4,5310065 -4 -70,913884 -4,9614787 33 -3,1233908 -3 -54,068698 -3,8506582 34 -3,183964 -2 -69,622011 -2,9916461 35 -1,9729214 -1,4397606 -1 -2,1259949 36 -12,957362 -103,64185 -12,96273 -13,313715 37 0 0 0 0 38 0 0 0 0 39 0 0 0 0 40 0 0 0 0 41 0 0 0 0 States UP RIGHT DOWN LEFT -11,024348 -11,198753 -11,139276 -11,195849 -10,768766 -10,793123 -11,193284 -10,768467 -10,124603 -10,161121 -10,127682 -10,426694 -9,5182989 -9,4442629 -9,5303384 -9,9700161 -8,6663503 -8,6299613 -8,8700016 -8,9387368 -7,8970854 -7,8095155 -8,0979207 -8,0828906 -6,9876845 -6,9687636 -7,1900092 -7,6021728 -6,2359939 -6,1542085 -6,2601776 -6,2320178 -5,5610507 -5,3593116 -5,501208 -5,6035276 -4,71 -4,6035114 -4,5783507 -4,8609271 -3,9 -3,7759391 -3,8125044 -4,0513057 -3,1389 -2,973 -2,9560418 -3,3706948 -11,570133 -11,496638 -11,775295 -11,584373 -11,242293 -11,116777 -11,081829 -11,088906 -10,322569 -10,313653 -10,409676 -10,861185 -9,7643562 -9,5445051 -9,5902708 -9,7142622 -8,7242882 -8,7147207 -8,7264299 -9,1798613 -7,8824798 -7,8123144 -7,7882484 -7,9262898 -7,0729858 -6,8940886 -6,8876484 -7,6071024 -6,0319487 -5,9355019 -5,9481047 -6,0458441 -5,2044891 -4,9594251 -4,9652472 -5,4393198 -4,3886214 -3,9855105 -3,9780608 -4,4595554 -3,1293615 -2,9974875 -2,9978206 -3,556979 -2,7588749 -2,1 -1,9999882 -2,2345458 -12,270355 -12 -13,209206 -12,85595 -11,727159 -11 -103,08547 -12,669874 -10,735764 -10 -98,304173 -11,515636 -10,095276 -9 -109,90926 -10,364232 -9,2807075 -8 -107,80543 -9,1218535 -7,5671372 -7 -96,737516 -7,652674 -7,604409 -6 -100,21851 -7,1450088 -6,2486058 -5 -98,528948 -6,5897915 -5,2325737 -4 -92,377803 -5,4911246 -4,0664859 -3 -92,786447 -3,8506582 -3,7995255 -2 -98,12125 -2,9916461 -2,4966739 -1,7254827 -1 -2,7901514 -13 -109,79015 -13,938227 -13,662917 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 UP RIGHT DOWN LEFT -12,224348 -12,044198 -12,463232 -12,349292 -11,368766 -11,407281 -11,567699 -11,741966 -10,724603 -10,758294 -10,878073 -10,956671 -10,118299 -9,9104485 -10,209248 -9,9700161 -9,2663503 -9,0282991 -9,4003475 -9,4925009 -8,1970854 -8,145322 -8,3578677 -8,6139991 -7,5583819 -7,3056764 -7,4218645 -7,6021728 -6,5359939 -6,4629871 -6,6290838 -6,800472 -5,8610507 -5,6457718 -5,6477581 -6,1414037 -5,01 -4,8068304 -4,7890915 -4,8609271 -3,9 -3,902123 -3,9078592 -4,0513057 -3,1389 -3,273 -2,9849214 -3,3706948 -12,609306 -12,612216 -12,579926 -12,784373 -11,845763 -11,794108 -11,845683 -12,478945 -11,237922 -10,873018 -10,900784 -11,436694 -10,008564 -9,9425947 -9,9517958 -10,268937 -9,2581936 -8,9864275 -8,9889605 -9,1798613 -8,1866584 -7,9931344 -7,9940185 -8,9510004 -7,0729858 -6,9970914 -6,9977784 -8,2920288 -6,0319487 -5,9988816 -5,9989739 -6,6224484 -6,0996963 -4,9994285 -4,999519 -6,232246 -4,8057376 -3,9997425 -3,9997874 -4,9212681 -3,1293615 -2,9999285 -2,9999384 -3,556979 -2,7588749 -2,1 -2 -2,2345458 -13,360876 -12 -13,954412 -12,991696 -12,565624 -11 -112,1835 -12,995431 -11,733772 -10 -112,79659 -11,980454 -10,740429 -9 -111,48554 -10,947642 -9,95339 -8 -112,12695 -9,9878453 -8,9112 -7 -112,67844 -8,8890419 -7,992178 -6 -112,91331 -7,965498 -6,9846114 -5 -112,41604 -6,9763523 -5,9533325 -4 -111,81117 -5,9401313 -4,9217978 -3 -110,6219 -4,8068301 -3,9307738 -2 -112,85584 -3,9418704 -2,9796888 -1,9340884 -1 -2,9827181 -13 -112,90933 -13,998779 -13,995334 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 UP RIGHT DOWN LEFT 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 7 0 0 0 0 8 0 0 0 0 9 0 0 0 0 10 0 0 0 0 11 0 0 0 0 12 0 0 0 0 13 0 0 0 0 14 0 0 0 0 15 0 0 0 0 16 0 0 0 0 17 0 0 0 0 18 0 0 0 0 19 0 0 0 0 20 0 0 0 0 21 0 0 0 0 22 0 0 0 0 23 0 0 0 0 24 0 0 0 0 25 0 0 0 0 26 0 0 0 0 27 0 0 0 0 28 0 0 0 0 29 0 0 0 0 30 0 0 0 0 31 0 0 0 0 32 0 0 0 0 33 0 0 0 0 34 0 0 0 0 35 0 0 0 0 36 0 0 0 0 37 0 0 0 0 38 0 0 0 0 39 0 0 0 0 40 0 0 0 0 41 0 0 0 0 Actions States „Cliff Walk“-Beispiel: State S Action A
  • 10. Action Selection Prinzipien • Exploration = Akkumulation von neuem Wissen • Exploitation = Nutzung existierenden Wissens Exploitation-Exploration Tradeoff • Pro Lernschritt: entweder Exploitation oder Exploration • Exploitation maximiert Reward in dem einen Schritt • Exploration maximiert (langfristig) kumulativen Reward Standardverfahren • -greedy: Mit Wahrscheinlichkeit • -decay: Schrittweise Reduktion von  zur Konvergenz des Lernprozesses 10 Antrittsvorlesung apl Prof : Exploration: Wahl einer zufälligen Aktion (1- ): Exploitation: Wahl der laut Q besten Aktion = Greedy Action
  • 11. Policy Update: Basisalgorithmen Q-Learning: „off-policy“ • Aktualisierung ohne Berücksichtigung der bereits gelernten Policy SARSA: „on-policy“ • Aktualisierung unter Kenntnis der bereits gelernten Policy 11 Antrittsvorlesung apl Prof SARSA Q-Learning Hyperparameter  „Discount Factor“ Hyperparameter  „Learning Rate“
  • 12. Online-RL für Adaptive Systeme Kombination von MAPE-K und RL [Palm et al. 2020; Metzger et al. 2022] Self-Adaptation Logic Analyze Monitor Execute Plan Knowledge Action A State S Reward R Action Selection Next state S’ Agent Policy Policy Update Environment Self-Adaptation Logic Realized via Reinforcement Learning Execute Policy (Knowledge) Monitor Action Selection (Analyze + Plan) Policy Update Action A State S Reward R Next state S’ Antrittsvorlesung apl Prof 12 Action = Adaptionsentscheidung Reward = Wie gut war die jeweilige Adaptionsentscheidung?
  • 13. Agenda 1. Herausforderungen beim Engineering Adaptiver Systeme 2. Online-Reinforcement-Learning für Adaptive Systeme 3. Problem 1: Große Anzahl an Adaptionsmöglichkeiten 4. Problem 2: Nichtstationarität 5. Diskussion und Ausblick Antrittsvorlesung apl Prof 13
  • 14. Problem beim Einsatz von Online-RL Exploration großer Anzahl diskreter Adaptionsmöglichkeiten • Beispiel: Service-orientiertes System • 8 abstrakte Services mit je 2 konkreten Services • 256 diskrete Adaptionsmöglichkeiten State of the Art bei adaptiven Systemen (z.B. [Xu et al. 2012; Jamshidi et al. 2015; Arabnejad et al., 2017; Wang et al. 2020]) • Nutzung von -greedy für Exploration-Exploitation-Tradeoff • Exploration erfolgt zufällig  Langsames Lernen bei großer Anzahl Adaptationsmöglichkeiten (siehe auch z.B. [Filho & Porter, 2017; Dulac-Arnold et al., 2015]) 14 Antrittsvorlesung apl Prof
  • 15. Lösungsansatz Feature-Modell-geführte Lernstrategien für systematische Exploration [Metzger et al., 2020a; Metzger et al., 2022] Explizite Modellierung der Adaptionsmöglichkeiten in einem Feature-Modell aus der Software- Produktlinienentwicklung [Metzger & Pohl, 2004] Exploration unter Nutzung der Struktur des Feature-Modells Antrittsvorlesung apl Prof 15 Self-Adaptation Logic Realized via Reinforcement Learning Execute Policy (Knowledge) Monitor Action Selection (Analyze + Plan) Policy Update Action a State s Reward r Next state s’ Feature-Modell
  • 16. Feature-Modelle zur Spezifikation der Adaptionsmöglichkeiten Web Shop Data Logging Content Discovery Min Max Medium Search Recommen- dation      Web Shop Data Logging Content Discovery Min Max Medium Search Recommen- dation     Nbr of Concurrent Users  1000  Adaptation Mandatory Optional Alternative  Activated • FM = Kompakte Spezifikation zulässiger System-Konfigurationen • Konkrete System-Konfiguration = Kombination aktivierter Features • Adaptation = Änderung der konkreten System-Konfiguration zur Laufzeit Recommendation  Max  Medium Recommendation  Max  Medium  Antrittsvorlesung apl Prof 16 Beispiel: Feature-Modell (FM) eines Web-Shops
  • 17. FM-geführte Exploration Web Shop Data Logging Content Discovery Min Max Medium Search Recommen- dation State of the Art: -greedy FM-geführt: FM-structure 2. Exploration der Konfigurationen mit diesem Blatt-Feature… 3. …dann erst Exploration der Konfigurationen mit dem “Geschwister”- Feature 1. Beginn bei zufällig selektiertem Blatt-Feature Recommendation  Max  Medium Antrittsvorlesung apl Prof 17
  • 18. Validierung Systeme Messung der Lern-Performanz • 500 Wiederholungen wg. stochastischen Effekten • “Reward”-Metriken nach [Taylor & Stone, 2009] Antrittsvorlesung apl Prof 18 Zeitschritt Reward Asymptotic Performance Time to Threshold (hier: 90% max-min Performance) Total Performance CloudRM [Mann, 2016] BerkeleyDB-J [Siegmund et al. 2012] Features 63 26 Anzahl Adaptionen 344 180 Tiefe des Feature-Modells 3 5 Initial Performance
  • 19. Validierung Antrittsvorlesung apl Prof 19 Ergebnisse Effekt der FM-Charakteristika • Höhere Verbesserung für CloudRM, da deutlich größere Anzahl an Adaptationsmöglichkeiten Effekt des Lernalgorithmus • Höhere Verbesserungen bei SARSA • Aber: Absolute Lern-Performanz von SARSA << Q-Learning • Grund: SARSA vermeidet riskante Adaptationen (vgl. „safe path“ bei Cliff Walk)  langsameres Lernen Verbesserung ggü. E-greedy Durchschnittlich Q-Learning SARSA Asymptotic Performance 0,3% -0,4% 1,1% Time to Threshold 25,4% 15,1% 35,8% Total Performance 33,7% 24,2% 43,2% SARSA vs. Q- Learning (absolut) -3.8% -27.6% -23.0%
  • 20. Agenda 1. Herausforderungen beim Engineering Adaptiver Systeme 2. Online-Reinforcement-Learning für Adaptive Systeme 3. Problem 1: Große Anzahl an Adaptionsmöglichkeiten 4. Problem 2: Nichtstationarität 5. Diskussion und Ausblick Antrittsvorlesung apl Prof 20
  • 21. Problem beim Einsatz von Online-RL Exploration vs Exploitation bei Nicht-Stationarität • Beispiel: Cloud-Anwendung • Änderung der CPU-Leistung der Cloud-Hardware über die Zeit • Effekt auf Performance der Cloud-Anwendung State of the Art bei adaptiven Systemen (z.B. [Xu et al. 2012; Jamshidi et al. 2015; Arabnejad et al., 2017; Wang et al. 2020]) • Nutzung von -decay für Konvergenz des Lernprozesses • Wenn  klein, zu wenig Exploration um Nicht-Stationarität zu erfassen  Erfordert Feststellung von Nicht-Stationarität und geeignete Erhöhung von -decay zur Laufzeit 21 Antrittsvorlesung apl Prof
  • 22. Grundidee Deep RL für automatische Anpassung der Exploration [Palm et al. 2020; Metzger et al. 2020b] Knowledge: Neuronales Netz statt Action-Value-Function Q • Generalisierung über bisher nicht-beobachtete Zustände Action Selection: Stochastisches Sampling • Keine Notwendigkeit des „Tunings“ Exploration  Exploitation • Automatische Anpassung insbesondere bei Nichtstationarität Policy Update: Gradientenverfahren • Typischer Ansatz zum Lernen der Gewichte des Neuronalen Netzes Antrittsvorlesung apl Prof 22 Self-Adaptation Logic Realized via Reinforcement Learning Execute Policy Monitor Action Selection Policy Update Adaptation Action a State s Reward r Next state s’ (Sampling) (Gradienten- Verfahren) (Neuro- nales Netz)
  • 23. Validierung Systeme Brownout-RUBiS: Adaptiver Web-Shop [Klein et al. 2014] PrBPM: Proaktives Geschäftsprozess-Monitoring-System [Metzger et al. 2019] Antrittsvorlesung apl Prof 23
  • 24. Validierung Antrittsvorlesung apl Prof 24 -0,2 -0,1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 100 1101 2102 3103 4104 5105 6106 7107 8108 9109 10110 11111 12112 13113 14114 15115 16116 17117 18118 19119 20120 21121 22122 23123 24124 25125 26126 27127 28128 29129 30130 31131 32132 33133 34134 35135 36136 37137 38138 39139 40140 41141 42142 43143 44144 45145 Rate der Adaptions- entscheidungen Frühzeitigkeit Reward Rate richtiger Adaptions- entscheidungen Brownout-RUBiS PrBPM Dimmer-Value Reward Antwortzeit 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 98 1099 2100 3101 4102 5103 6104 7105 8106 9107 10108 11109 12110 13111 14112 15113 16114 17115 18116 19117 20118 21119 22120 23121 24122 25123 26124 27125 28126 29127 30128 31129 32130 33131 34132 35133 36134 37135 38136 39137 40138 41139 42140 43141 44142 45143 46144 Accuracy (avg MAE; last 100 cases) Accur a cy ( avg MAE; l ast 100 ca ses) Prognosegenauigkeit Workload 100 % 50% CPU-Leistung Automatische Handhabung von Nicht-Stationarität
  • 25. Agenda 1. Herausforderungen beim Engineering Adaptiver Systeme 2. Online-Reinforcement-Learning für Adaptive Systeme 3. Problem 1: Große Anzahl an Adaptionsmöglichkeiten 4. Problem 2: Nichtstationarität 5. Diskussion und Ausblick Antrittsvorlesung apl Prof 25
  • 26. Diskussion und Ausblick Online-RL nicht für alle Systemtypen geeignet • Riskant, wenn “falsche” Adaptionen Schaden verursachen  Safe Reinforcement Learning zur sicheren Exploration • Manipulierbar durch „adversarial“ Input aus der Umgebung (“gefälschte” Beobachtungen)  Adversarial Machine Learning um Robustheit ggü. Angriffen zu erhöhen Geringe initiale Performanz von Online-RL • Selbst einfache/bekannte Zusammenhänge müssen zu Beginn gelernt werden  Meta-RL zur Wiederverwendung von in „verwandten“ Umgebungen gelerntem Wissen Reward-Engineering Problem von RL allgemein • Richtige Formulierung der Reward-Funktion essentiell für „Lernerfolg“ • Nicht transparent was RL lernt (besonders bei Deep RL)  Explainable Machine Learning für das „Debugging“ der Reward-Funktion 26 Antrittsvorlesung apl Prof
  • 27. Danke! Research leading to these results has received funding from the EU’s Horizon 2020 research and innovation programme under grant agreements no. 780351 – www.enact-project.eu 731932 – www.transformingtransport.eu 871493 – www.dataports-project.eu Grundlagenliteratur • D. Weyns, Introduction to Self-Adaptive Systems: A Contemporary Software Engineering Perspective, Wiley, 2021 • R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018 Weiterführende Literatur Exploration großer Anzahl Adaptionsmöglichkeiten • A. Metzger, C. Quinton, Z. Á. Mann, L. Baresi, K. Pohl, “Realizing Self-Adaptive Systems via Online Reinforcement Learning and Feature-Model-guided Exploration”, Computing, Springer, March, 2022 • A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature model-guided online reinforcement learning for self-adaptive services,” in 18th Int’l Conf. on Service-Oriented Computing (ICSOC 2020), LNCS 12571, Springer, 2020 Exploration vs Exploitation bei Nicht-Stationarität • A. Palm, A. Metzger, and K. Pohl, “Online reinforcement learning for self-adaptive information systems,” in 32nd Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2020), LNCS 12127. Springer, 2020 • A. Metzger, T. Kley, and A. Palm, “Triggering proactive business process adaptations via online reinforcement learning,” in 18th Int’l Conf. on Business Process Management (BPM 2020), LNCS 12168. Springer, 2020 Antrittsvorlesung apl Prof 27
  • 28. Referenzen [Arabnejad et al., 2017] H. Arabnejad, C. Pahl, P. Jamshidi, and G. Estrada, “A comparison of reinforcement learning techniques for fuzzy cloud autoscaling,” in 17th Intl Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017 [De Lemos et al. 2010] R. de Lemos et al., “Software Engineering for Self-Adaptive Systems: A Second Research Roadmap,” in Softw. Eng. for Self-Adaptive Systems II, ser. LNCS. Springer, 2013, vol. 7475, pp. 1–32 [Di Francescomarino et al. 2018] Chiara Di Francescomarino, Chiara Ghidini, Fabrizio Maria Maggi, Fredrik Milani: Predictive Process Monitoring Methods: Which One Suits Me Best? BPM 2018: 462-479 [Dulac-Arnold et al. 2015] Gabriel Dulac-Arnold, Richard Evans, Peter Sunehag, Ben Coppin: Reinforcement Learning in Large Discrete Action Spaces. CoRR abs/1512.07679 (2015) [Evermann et al. 2017] Evermann, J., Rehse, J., Fettke, P.: Predicting process behaviour using deep learning. Decision Support Systems 100, 2017 [Filho & Porter, 2017] Filho, R.V.R., Porter, B.: Defining emergent software using continuous self-assembly, perception, and learning. TAAS 12(3), 16:1–16:25 (2017) [Jamshidi et al., 2015] P. Jamshidi, A. Molzam Sharifloo, C. Pahl, A. Metzger, and G. Estrada, “Self-learning cloud controllers: Fuzzy Q- learning for knowledge evolution (short paper),” in Int’l Conference on Cloud and Autonomic Computing (IC- CAC 2015) Cambridge, USA, September 21-24, 2015, [Kephart & Chess, 2003] J. O. Kephart and D. M. Chess, “The vision of autonomic computing,” IEEE Computer, vol. 36, no. 1, pp. 41–50, 2003. [Klein et al. 2014] C. Klein, M. Maggio, K. Arzen, F. Hernandez-Rodriguez, “Brownout: building more robust cloud applications”. In: 36th Intl Conf. on Software Engineering (ICSE 2014), pp. 700–711. ACM, 2014 [Mann, 2016] Z. Mann, “Interplay of virtual machine selection and virtual machine placement”, in: 5th European Conf. on Service- Oriented and Cloud Computing, ESOCC’16, LNCS vol. 9846, pp. 137–151 (2016) [Metzger & Pohl, 2014] A. Metzger, K. Pohl, “Software product line engineering and variability management: Achievements and challenges,” in ICSE Future of Software Engineering Track (FOSE 2014), ACM, 2014, pp. 70–84. 28
  • 29. Referenzen [Metzger et al. 2019] A. Metzger, A. Neubauer, P. Bohn, and K. Pohl, “Proactive process adaptation using deep learning ensembles,” in 31st Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2019), LNCS, vol. 11483. Springer, 2019, pp. 547–562 [Metzger et al. 2020] A. Metzger, C. Quinton, Z. Á. Mann, L. Baresi, K. Pohl, “Realizing Self-Adaptive Systems via Online Reinforcement Learning and Feature-Model-guided Exploration”, Computing, Springer, March, 2022 [Metzger et al. 2020a] A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature model-guided online reinforcement learning for self-adaptive services,” in 18th Int’l Conf. on Service-Oriented Computing (ICSOC 2020), LNCS 12571, Springer, 2020 [Metzger et al. 2020b] A. Metzger, T. Kley, and A. Palm, “Triggering proactive business process adaptations via online reinforcement learning,” in 18th Int’l Conf. on Business Process Management (BPM 2020), LNCS 12168. Springer, 2020e [Metzger, 2021] Workshop on Software in Electronics, Components and Systems-based Digitisation, Virtual, May, 2021, “Software Engineering for ECS: Towards Dev-Ops-Adapt” (presentation slides) [Palm et al. 2020] A. Palm, A. Metzger, and K. Pohl, “Online reinforcement learning for self-adaptive information systems,” in 32nd Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2020), LNCS 12127. Springer, 2020 [Salehie & Tahvildari, 2009] M. Salehie and L. Tahvildari, “Self-adaptive software: Landscape and research challenges,” TAAS, vol. 4, no. 2, 2009. 29
  • 30. Referenzen [Siegmund et al. 2012] N. Siegmund, S. Kolesnikov, C. Kästner, S. Apel, D. Batory, M. Rosenmüller, G. Saake, G.: Predicting Performance via Automated Feature-interaction Detection. In: 34th Intl Conf. on Software Engineering (ICSE 2012), pp. 167–177, ACM, 2012 [Sutton & Barto, 2018] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA, USA: MIT Press, 2018 [Taylor & Stone, 2009] M. Taylor, P. Stone: Transfer learning for reinforcement learning domains: A survey. J. Mach. Learn. Res. 10, 1633–1685 (2009) [Wang et al., 2020] Hongbing Wang, Jiajie Li, Qi Yu, Tianjing Hong, Jia Yan, Wei Zhao: Integrating recurrent neural networks and reinforcement learning for dynamic service composition. Future Gener. Comput. Syst. 107: 551-563 (2020) [Weyns et al. 2013] Danny Weyns, Nelly Bencomo, Radu Calinescu, Javier Cámara, Carlo Ghezzi, Vincenzo Grassi, Lars Grunske, Paola Inverardi, Jean-Marc Jézéquel, Sam Malek, Raffaela Mirandola, Marco Mori, Giordano Tamburrelli: Perpetual Assurances for Self- Adaptive Systems. Software Engineering for Self-Adaptive Systems 2013: 31-63 [Weyns, 2021] Danny Weyns, Introduction to Self-Adaptive Systems: A Contemporary Software Engineering Perspective, Wiley, 2021. [Xu et al., 2012] C. Xu, J. Rao, and X. Bu, “URL: A unified reinforcement learning approach for autonomic cloud management,” J. Parallel Distrib. Comput., vol. 72, no. 2, pp. 95–105, 2012 30

Notas do Editor

  1. Beobachtung von Veränderungen in Umgebung, Anforderungen und sich selbst Anpassungen von Struktur, Parametern und Verhalten -- M. Papazoglou, K. Pohl, M. Parkin, and A. Metzger, Eds., Service Research Challenges and Solutions for the Future Internet: S-Cube – Towards Mechanisms and Methods for Engineering, Managing, and Adapting Service-Based Systems, ser. LNCS. Heidelberg, Germany: Springer, 2010, vol. 6500.
  2. Beobachtung von Veränderungen in Umgebung, Anforderungen und sich selbst Anpassungen von Struktur, Parametern und Verhalten -- M. Papazoglou, K. Pohl, M. Parkin, and A. Metzger, Eds., Service Research Challenges and Solutions for the Future Internet: S-Cube – Towards Mechanisms and Methods for Engineering, Managing, and Adapting Service-Based Systems, ser. LNCS. Heidelberg, Germany: Springer, 2010, vol. 6500.
  3. Trotz dieser Möglichkeiten, zeigen sich beim Einsatz von ML für AS konkrete Probleme, von denen ich auf zwei im weiteren verlauf genauer eingehen werde… -- Bradley Schmerl, David Garlan, Christian Kästner - CMU Danny Weyns – U Leuven Pooyan Jamshidi – U South Carolina Javier Camara – U York Hongbing Wang – U Nanjing Sven Tomforde – U Kiel --- N. Esfahani, E. Kouroshfar, and S. Malek, “Taming Uncertainty in Self-adaptive Software,” in Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ser. ESEC/FSE ’11, 2011, pp. 234–244. -- A. J. Ramirez, A. C. Jensen, and B. H. C. Cheng, “A taxonomy of uncertainty for dynamically adaptive systems,” in 7th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, SEAMS, 2012, pp. 99–108. --
  4. Trotz dieser Möglichkeiten, zeigen sich beim Einsatz von ML für AS konkrete Probleme, von denen ich auf zwei im weiteren verlauf genauer eingehen werde… -- Bradley Schmerl, David Garlan, Christian Kästner - CMU Danny Weyns – U Leuven Pooyan Jamshidi – U South Carolina Javier Camara – U York Hongbing Wang – U Nanjing Sven Tomforde – U Kiel --- N. Esfahani, E. Kouroshfar, and S. Malek, “Taming Uncertainty in Self-adaptive Software,” in Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ser. ESEC/FSE ’11, 2011, pp. 234–244. -- A. J. Ramirez, A. C. Jensen, and B. H. C. Cheng, “A taxonomy of uncertainty for dynamically adaptive systems,” in 7th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, SEAMS, 2012, pp. 99–108. --
  5. Q: Risiko „gefährliche“ Aktionen zu präferieren SARSA: Kenntnis von gefährlichen Aktionen, daher „safe“ path There are two hyper-parameters: the learning rate α, which defines to what extent newly acquired knowledge overwrites old knowl- edge, and the discount factor γ , which defines the relevance of future rewards.
  6. Die Beiträge wurden u.a. veröffentlicht auf dem SEAMS Symposium, sowie aktuell auf der ICSOC, wo wir den Best Paper Award erhielten
  7. Strategy exploits semantics typically encoded in feature models. Non-leaf features are usually abstract features, which delegate their realization to their sub-features. Sub-features thus may offer different realizations of their abstract parent feature. If no configuration containing f or a sibling feature of f is found, then the strategy moves on to the parent feature of f, which is repeated until a configuration is found (line 13) or the root feature is reached (line 22).
  8. We used an e decay rate of 0.97 (i.e., e < 1% after time step 150), as this led to fastest convergence with highest asymptotic rewards for e-greedy. Grund: viele Konifgurationen mit sehr ähnlicher Performanz
  9. Die Beiträge wurden u.a. veröffentlicht auf dem SEAMS Symposium, sowie aktuell auf der ICSOC, wo wir den Best Paper Award erhielten