Antrittsvorlesung - APL.pptx

Online-Reinforcement-
Learning für
Adaptive Systeme
Andreas Metzger
Antrittsvorlesung im Rahmen
der Verleihung der Bezeichnung
außerplanmäßiger Professor
Essen, 26.04.2022

Agenda
1. Herausforderungen beim Engineering Adaptiver Systeme
2. Online-Reinforcement-Learning für Adaptive Systeme
3. Problem 1: Große Anzahl an Adaptionsmöglichkeiten
4. Problem 2: Nichtstationarität
5. Diskussion und Ausblick
Antrittsvorlesung apl Prof 2

Grundlagen
(Selbst-)Adaptives Software-System [Salehie & Tahvildari, 2009; Weyns, 2021]
• Beobachtet Veränderungen in Umgebung, Anforderungen und sich selbst
• Modifiziert seine Struktur, Parameter und Verhalten
Beispielhaftes Software-Lebenszyklusmodell [Metzger, 2021]
DEV OPS
self-observe
ADAPT
self-modify

Grundlagen
MAPE-K Referenzmodell [Kephart & Chess, 2003; Salehie & Tahvildari, 2009]
Beispiel: Adaptiver Web-Shop
• Monitor: Drastischer Anstieg der Nutzer (Workload)
• Analyze: Zu langsame Antwortzeit des Web-Shops
• Plan: Deaktivierung optionaler Empfehlungs-Funktionalität
• Execute: Ersetzen dynamischer Empfehlungen durch statischen Banner
Self-Adaptation Logic
Analyze
Monitor Execute
Plan
Knowledge
Ableiten konkreter
Anpassungen
Umsetzen der
Anpassungen
Feststellen der
Anpassungs-
notwendigkeit
Sammeln und
aggregieren von
Beobachtungsdaten
System Logic
Sensors Effectors
0
e
+
0
0
1
e
+
0
5
2
e
+
0
5
3
e
+
0
5
4
e
+
0
5
5
e
+
0
5
6
e
+
0
5
1
0
0
1
5
0
2
0
0
2
5
0
d
$
e
p
i
s
o
d
e
Workload
Zeit

Engineering Adaptiver Systeme
Herausforderung „Design Time Uncertainty“ [Weyns et al. 2013; Weyns, 2021]
Antizipation möglicher Umgebungssituationen
• Auf welche möglichen Umgebungszustände soll das adaptive System reagieren?
• Beispiel: Unterschiedliche Workloads des Web-Shops
Kenntnis der Auswirkungen von Adaptionen auf das System
• Welchen genauen Effekt hat welche Adaption in welcher Umgebungssituation?
• Welche Adaptation ist jeweils geeignet?
• Beispiel: Konkreter Effekt des Abschaltens der dynamischen Empfehlungen auf Antwortzeit?
Umgang mit Nicht-Stationarität („Concept Drift“)
• Welche Effekte welcher Adaptationen ändern sich über die Zeit?
• Beispiel: Cloud-Provider migriert auf leistungsstärkere Rechner
 Anpassung des Web-Shops hat andere Auswirkung auf die Antwortzeit als vor der Migration

Agenda

Online Reinforcement Learning
Online-Reinforcement-Learning
Lösungsansatz für „Design Time Uncertainty“
[Xu et al. 2012; Jamshidi et al. 2015; Arabnejad et al., 2017; Wang et al. 2020]
• Einsatz von Reinforcement Learning zur Laufzeit
• Lernen auf Basis konkreter Beobachtungen (Daten, Feedback)
Analyze
Monitor Execute
Plan
Knowledge
System Logic
Sensors Effectors
Learn
Feedback Update

Reinforcement Learning (RL)
Grundlegendes „Modell“
[Sutton & Barto, 2018]
Ziel von RL: Maximierung des
kumulativen Rewards
basierend auf [Sutton & Barto, 2018]
Action A
State S
Reward R
Action
Selection
Next state S’
Agent
Policy
Policy Update
Environment
Standard-Beispiel: „Cliff Walk“
Actions = {UP, DOWN,
LEFT, RIGHT}
Reward
[Sutton & Barto, 2018]
States:

Policy
Basis-Repräsentation
Action-Value Function Q(S, A) = Erwarteter kumulativer Reward von A in S
9
Antrittsvorlesung apl Prof
UP RIGHT DOWN LEFT
0 -11,024348 -10,993611 -11,139276 -10,895849
1 -10,468766 -10,545294 -10,487413 -10,768467
2 -10,124603 -9,9857087 -10,127682 -9,9800919
3 -9,2182989 -9,2436838 -9,2244595 -9,9700161
4 -8,6663503 -8,4674264 -8,5469998 -8,9387368
5 -7,5970854 -7,6277813 -8,0979207 -8,0828906
6 -6,9876845 -6,8140858 -6,9729081 -7,1386728
7 -6,2359939 -6,0596636 -6,0000182 -6,2320178
8 -5,2610507 -5,2221814 -5,501208 -5,6035276
9 -4,41 -4,4164376 -4,5783507 -4,8609271
10 -3,9 -3,7068078 -3,6272011 -4,0513057
11 -3,1389 -2,973 -2,9372454 -3,3706948
12 -11,430541 -11,289191 -11,775295 -11,584373
13 -10,846032 -10,640915 -11,081829 -11,088906
14 -10,040774 -9,9527793 -10,156681 -10,349203
15 -9,2281034 -9,174737 -9,1638677 -9,7142622
16 -8,4058004 -8,4137873 -8,4416937 -8,6208545
17 -7,8824798 -7,6137387 -7,6974977 -7,9262898
18 -6,7553715 -6,6965786 -6,770711 -7,6071024
19 -6,0319487 -5,8025357 -5,8487016 -6,0458441
20 -5,2044891 -4,8745608 -4,8986798 -5,4393198
21 -3,9508877 -3,9380133 -3,9360373 -4,4595554
22 -3,1293615 -2,9790466 -2,9814757 -3,556979
23 -2,7588749 -2,1 -1,9997082 -2,2345458
24 -12,046042 -11,980541 -12,386135 -12,400755
25 -11,254962 -10,990933 -71,713043 -12,328903
26 -10,305163 -9,9967061 -83,024489 -10,591003
27 -9,2933022 -8,9987946 -94,610418 -10,091768
28 -8,6139253 -7,9997238 -82,098353 -8,2078643
29 -7,1185938 -6,999952 -89,767881 -7,0752485
30 -7,1699786 -5,9999941 -94,74073 -6,778584
31 -5,010696 -4,9999996 -70,811303 -6,1628398
32 -4,5310065 -4 -70,913884 -4,9614787
33 -3,1233908 -3 -54,068698 -3,8506582
34 -3,183964 -2 -69,622011 -2,9916461
35 -1,9729214 -1,4397606 -1 -2,1259949
36 -12,957362 -103,64185 -12,96273 -13,313715
37 0 0 0 0
38 0 0 0 0
39 0 0 0 0
40 0 0 0 0
41 0 0 0 0
States
UP RIGHT DOWN LEFT
-11,024348 -11,198753 -11,139276 -11,195849
-10,768766 -10,793123 -11,193284 -10,768467
-10,124603 -10,161121 -10,127682 -10,426694
-9,5182989 -9,4442629 -9,5303384 -9,9700161
-8,6663503 -8,6299613 -8,8700016 -8,9387368
-7,8970854 -7,8095155 -8,0979207 -8,0828906
-6,9876845 -6,9687636 -7,1900092 -7,6021728
-6,2359939 -6,1542085 -6,2601776 -6,2320178
-5,5610507 -5,3593116 -5,501208 -5,6035276
-4,71 -4,6035114 -4,5783507 -4,8609271
-3,9 -3,7759391 -3,8125044 -4,0513057
-3,1389 -2,973 -2,9560418 -3,3706948
-11,570133 -11,496638 -11,775295 -11,584373
-11,242293 -11,116777 -11,081829 -11,088906
-10,322569 -10,313653 -10,409676 -10,861185
-9,7643562 -9,5445051 -9,5902708 -9,7142622
-8,7242882 -8,7147207 -8,7264299 -9,1798613
-7,8824798 -7,8123144 -7,7882484 -7,9262898
-7,0729858 -6,8940886 -6,8876484 -7,6071024
-6,0319487 -5,9355019 -5,9481047 -6,0458441
-5,2044891 -4,9594251 -4,9652472 -5,4393198
-4,3886214 -3,9855105 -3,9780608 -4,4595554
-3,1293615 -2,9974875 -2,9978206 -3,556979
-2,7588749 -2,1 -1,9999882 -2,2345458
-12,270355 -12 -13,209206 -12,85595
-11,727159 -11 -103,08547 -12,669874
-10,735764 -10 -98,304173 -11,515636
-10,095276 -9 -109,90926 -10,364232
-9,2807075 -8 -107,80543 -9,1218535
-7,5671372 -7 -96,737516 -7,652674
-7,604409 -6 -100,21851 -7,1450088
-6,2486058 -5 -98,528948 -6,5897915
-5,2325737 -4 -92,377803 -5,4911246
-4,0664859 -3 -92,786447 -3,8506582
-3,7995255 -2 -98,12125 -2,9916461
-2,4966739 -1,7254827 -1 -2,7901514
-13 -109,79015 -13,938227 -13,662917
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
UP RIGHT DOWN LEFT
-12,224348 -12,044198 -12,463232 -12,349292
-11,368766 -11,407281 -11,567699 -11,741966
-10,724603 -10,758294 -10,878073 -10,956671
-10,118299 -9,9104485 -10,209248 -9,9700161
-9,2663503 -9,0282991 -9,4003475 -9,4925009
-8,1970854 -8,145322 -8,3578677 -8,6139991
-7,5583819 -7,3056764 -7,4218645 -7,6021728
-6,5359939 -6,4629871 -6,6290838 -6,800472
-5,8610507 -5,6457718 -5,6477581 -6,1414037
-5,01 -4,8068304 -4,7890915 -4,8609271
-3,9 -3,902123 -3,9078592 -4,0513057
-3,1389 -3,273 -2,9849214 -3,3706948
-12,609306 -12,612216 -12,579926 -12,784373
-11,845763 -11,794108 -11,845683 -12,478945
-11,237922 -10,873018 -10,900784 -11,436694
-10,008564 -9,9425947 -9,9517958 -10,268937
-9,2581936 -8,9864275 -8,9889605 -9,1798613
-8,1866584 -7,9931344 -7,9940185 -8,9510004
-7,0729858 -6,9970914 -6,9977784 -8,2920288
-6,0319487 -5,9988816 -5,9989739 -6,6224484
-6,0996963 -4,9994285 -4,999519 -6,232246
-4,8057376 -3,9997425 -3,9997874 -4,9212681
-3,1293615 -2,9999285 -2,9999384 -3,556979
-2,7588749 -2,1 -2 -2,2345458
-13,360876 -12 -13,954412 -12,991696
-12,565624 -11 -112,1835 -12,995431
-11,733772 -10 -112,79659 -11,980454
-10,740429 -9 -111,48554 -10,947642
-9,95339 -8 -112,12695 -9,9878453
-8,9112 -7 -112,67844 -8,8890419
-7,992178 -6 -112,91331 -7,965498
-6,9846114 -5 -112,41604 -6,9763523
-5,9533325 -4 -111,81117 -5,9401313
-4,9217978 -3 -110,6219 -4,8068301
-3,9307738 -2 -112,85584 -3,9418704
-2,9796888 -1,9340884 -1 -2,9827181
-13 -112,90933 -13,998779 -13,995334
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
UP RIGHT DOWN LEFT
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 0 0 0 0
11 0 0 0 0
12 0 0 0 0
13 0 0 0 0
14 0 0 0 0
15 0 0 0 0
16 0 0 0 0
17 0 0 0 0
18 0 0 0 0
19 0 0 0 0
20 0 0 0 0
21 0 0 0 0
22 0 0 0 0
23 0 0 0 0
24 0 0 0 0
25 0 0 0 0
26 0 0 0 0
27 0 0 0 0
28 0 0 0 0
29 0 0 0 0
30 0 0 0 0
31 0 0 0 0
32 0 0 0 0
33 0 0 0 0
34 0 0 0 0
35 0 0 0 0
36 0 0 0 0
37 0 0 0 0
38 0 0 0 0
39 0 0 0 0
40 0 0 0 0
41 0 0 0 0
Actions
States
„Cliff Walk“-Beispiel:
State
S
Action A

Action Selection
Prinzipien
• Exploration = Akkumulation von neuem Wissen
• Exploitation = Nutzung existierenden Wissens
Exploitation-Exploration Tradeoff
• Pro Lernschritt: entweder Exploitation oder Exploration
• Exploitation maximiert Reward in dem einen Schritt
• Exploration maximiert (langfristig) kumulativen Reward
Standardverfahren
• -greedy: Mit Wahrscheinlichkeit
• -decay: Schrittweise Reduktion
von  zur Konvergenz des Lernprozesses
10
: Exploration: Wahl einer
zufälligen Aktion
(1- ): Exploitation: Wahl der laut Q
besten Aktion = Greedy Action

Policy Update: Basisalgorithmen
Q-Learning: „off-policy“
• Aktualisierung ohne Berücksichtigung
der bereits gelernten Policy
SARSA: „on-policy“
• Aktualisierung unter Kenntnis
der bereits gelernten Policy
11
SARSA
Q-Learning
Hyperparameter 
„Discount Factor“
Hyperparameter 
„Learning Rate“

Online-RL für Adaptive Systeme
Kombination von MAPE-K und RL [Palm et al. 2020; Metzger et al. 2022]
Analyze
Monitor Execute
Plan
Knowledge
Action A
State S
Reward R
Action
Selection
Next state S’
Agent
Policy
Policy Update
Environment
Realized via Reinforcement Learning
Execute
Policy
(Knowledge)
Monitor
Action
Selection
(Analyze + Plan)
Policy Update
Action
A
State S
Reward R
Next state S’
Action = Adaptionsentscheidung
Reward = Wie gut war die jeweilige
Adaptionsentscheidung?

Agenda

Problem beim Einsatz von Online-RL
Exploration großer Anzahl diskreter Adaptionsmöglichkeiten
• Beispiel: Service-orientiertes System
• 8 abstrakte Services mit je 2 konkreten Services
• 256 diskrete Adaptionsmöglichkeiten
State of the Art bei adaptiven Systemen (z.B. [Xu et al. 2012; Jamshidi et al.
2015; Arabnejad et al., 2017; Wang et al. 2020])
• Nutzung von -greedy für Exploration-Exploitation-Tradeoff
• Exploration erfolgt zufällig
 Langsames Lernen bei großer Anzahl
Adaptationsmöglichkeiten
(siehe auch z.B. [Filho & Porter, 2017; Dulac-Arnold et al., 2015])
14

Lösungsansatz
Feature-Modell-geführte
Lernstrategien für
systematische Exploration
[Metzger et al., 2020a; Metzger et al., 2022]
Explizite Modellierung
der Adaptionsmöglichkeiten
in einem Feature-Modell
aus der Software-
Produktlinienentwicklung
[Metzger & Pohl, 2004]
Exploration unter
Nutzung der Struktur des
Feature-Modells
Execute
Policy
(Knowledge)
Monitor
Action
Selection
(Analyze + Plan)
Policy Update
Action
a
State s
Reward r
Next state s’
Feature-Modell

Feature-Modelle zur Spezifikation der
Adaptionsmöglichkeiten
Web Shop
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation

  

Web Shop
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation

 

Nbr of Concurrent Users  1000  Adaptation
Mandatory
Optional
Alternative
 Activated
• FM = Kompakte Spezifikation zulässiger System-Konfigurationen
• Konkrete System-Konfiguration = Kombination aktivierter Features
• Adaptation = Änderung der konkreten System-Konfiguration zur Laufzeit
Recommendation
 Max  Medium
Recommendation
 Max  Medium

Beispiel:
Feature-Modell (FM)
eines Web-Shops

FM-geführte Exploration Web Shop
Data
Logging
Content
Discovery
Min Max
Medium
Search
Recommen-
dation
State of the Art: -greedy
FM-geführt: FM-structure
2. Exploration der Konfigurationen mit
diesem Blatt-Feature…
3. …dann erst Exploration der
Konfigurationen mit dem “Geschwister”-
Feature
1. Beginn bei zufällig selektiertem
Blatt-Feature
Recommendation
 Max  Medium

Validierung
Systeme
Messung der Lern-Performanz
• 500 Wiederholungen wg.
stochastischen Effekten
• “Reward”-Metriken nach
[Taylor & Stone, 2009]
Zeitschritt
Reward
Asymptotic
Performance
Time to Threshold
(hier: 90% max-min Performance)
Total Performance
CloudRM [Mann, 2016]
BerkeleyDB-J [Siegmund et al. 2012]
Features 63 26
Anzahl Adaptionen 344 180
Tiefe des Feature-Modells 3 5
Initial Performance

Validierung
Ergebnisse
Effekt der FM-Charakteristika
• Höhere Verbesserung für
CloudRM, da deutlich
größere Anzahl an
Adaptationsmöglichkeiten
Effekt des Lernalgorithmus
• Höhere Verbesserungen bei SARSA
• Aber: Absolute Lern-Performanz
von SARSA << Q-Learning
• Grund: SARSA vermeidet riskante
Adaptationen (vgl. „safe path“ bei
Cliff Walk)
 langsameres Lernen
Verbesserung ggü. E-greedy Durchschnittlich Q-Learning SARSA
Asymptotic Performance 0,3% -0,4% 1,1%
Time to Threshold 25,4% 15,1% 35,8%
Total Performance 33,7% 24,2% 43,2%
SARSA vs. Q-
Learning (absolut)
-3.8%
-27.6%
-23.0%

Agenda

Problem beim Einsatz von Online-RL
Exploration vs Exploitation bei Nicht-Stationarität
• Beispiel: Cloud-Anwendung
• Änderung der CPU-Leistung der Cloud-Hardware über die Zeit
• Effekt auf Performance der Cloud-Anwendung
State of the Art bei adaptiven Systemen (z.B. [Xu et al. 2012; Jamshidi et
al. 2015; Arabnejad et al., 2017; Wang et al. 2020])
• Nutzung von -decay für Konvergenz des Lernprozesses
• Wenn  klein, zu wenig Exploration um Nicht-Stationarität zu erfassen
 Erfordert Feststellung von Nicht-Stationarität und geeignete
Erhöhung von -decay zur Laufzeit
21

Grundidee
Deep RL für automatische Anpassung der Exploration
[Palm et al. 2020; Metzger et al. 2020b]
Knowledge: Neuronales Netz statt Action-Value-Function Q
• Generalisierung über bisher nicht-beobachtete Zustände
Action Selection: Stochastisches Sampling
• Keine Notwendigkeit des „Tunings“
Exploration  Exploitation
• Automatische Anpassung insbesondere
bei Nichtstationarität
Policy Update: Gradientenverfahren
• Typischer Ansatz zum Lernen
der Gewichte des Neuronalen Netzes
Execute
Policy
Monitor
Action
Selection
Policy Update
Adaptation
Action
a
State s
Reward r
Next state s’
(Sampling)
(Gradienten-
Verfahren)
(Neuro-
nales
Netz)

Validierung
Systeme
Brownout-RUBiS: Adaptiver Web-Shop [Klein et al. 2014]
PrBPM: Proaktives Geschäftsprozess-Monitoring-System [Metzger et al. 2019]

Validierung
-0,2
-0,1
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
100
1101
2102
3103
4104
5105
6106
7107
8108
9109
10110
11111
12112
13113
14114
15115
16116
17117
18118
19119
20120
21121
22122
23123
24124
25125
26126
27127
28128
29129
30130
31131
32132
33133
34134
35135
36136
37137
38138
39139
40140
41141
42142
43143
44144
45145
Rate der Adaptions-
entscheidungen
Frühzeitigkeit
Reward
Rate richtiger
Adaptions-
entscheidungen
Brownout-RUBiS
PrBPM
Dimmer-Value
Reward
Antwortzeit
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
98
1099
2100
3101
4102
5103
6104
7105
8106
9107
10108
11109
12110
13111
14112
15113
16114
17115
18116
19117
20118
21119
22120
23121
24122
25123
26124
27125
28126
29127
30128
31129
32130
33131
34132
35133
36134
37135
38136
39137
40138
41139
42140
43141
44142
45143
46144
Accuracy (avg MAE; last 100 cases)
Accur a
cy (
avg MAE; l
ast 100 ca
ses)
Prognosegenauigkeit
Workload
100 %
50%
CPU-Leistung
Automatische Handhabung
von Nicht-Stationarität

Agenda

Diskussion und Ausblick
Online-RL nicht für alle Systemtypen geeignet
• Riskant, wenn “falsche” Adaptionen Schaden verursachen
 Safe Reinforcement Learning zur sicheren Exploration
• Manipulierbar durch „adversarial“ Input aus der Umgebung (“gefälschte” Beobachtungen)
 Adversarial Machine Learning um Robustheit ggü. Angriffen zu erhöhen
Geringe initiale Performanz von Online-RL
• Selbst einfache/bekannte Zusammenhänge müssen zu Beginn gelernt werden
 Meta-RL zur Wiederverwendung von in „verwandten“ Umgebungen gelerntem Wissen
Reward-Engineering Problem von RL allgemein
• Richtige Formulierung der Reward-Funktion essentiell für „Lernerfolg“
• Nicht transparent was RL lernt (besonders bei Deep RL)
 Explainable Machine Learning für das „Debugging“ der Reward-Funktion
26

Danke!
Research leading to these results has received funding from the EU’s Horizon 2020 research and
innovation programme under grant agreements no.
780351 – www.enact-project.eu
731932 – www.transformingtransport.eu
871493 – www.dataports-project.eu
Grundlagenliteratur
• D. Weyns, Introduction to Self-Adaptive Systems: A Contemporary Software Engineering Perspective, Wiley, 2021
• R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018
Weiterführende Literatur
Exploration großer Anzahl Adaptionsmöglichkeiten
• A. Metzger, C. Quinton, Z. Á. Mann, L. Baresi, K. Pohl, “Realizing Self-Adaptive Systems via Online Reinforcement
Learning and Feature-Model-guided Exploration”, Computing, Springer, March, 2022
• A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature model-guided online reinforcement learning for
self-adaptive services,” in 18th Int’l Conf. on Service-Oriented Computing (ICSOC 2020), LNCS 12571, Springer, 2020
Exploration vs Exploitation bei Nicht-Stationarität
• A. Palm, A. Metzger, and K. Pohl, “Online reinforcement learning for self-adaptive information systems,” in 32nd
Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2020), LNCS 12127. Springer, 2020
• A. Metzger, T. Kley, and A. Palm, “Triggering proactive business process adaptations via online reinforcement
learning,” in 18th Int’l Conf. on Business Process Management (BPM 2020), LNCS 12168. Springer, 2020

Referenzen
[Arabnejad et al., 2017] H. Arabnejad, C. Pahl, P. Jamshidi, and G. Estrada, “A comparison of reinforcement learning techniques for
fuzzy cloud autoscaling,” in 17th Intl Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017
[De Lemos et al. 2010] R. de Lemos et al., “Software Engineering for Self-Adaptive Systems: A Second Research Roadmap,” in Softw.
Eng. for Self-Adaptive Systems II, ser. LNCS. Springer, 2013, vol. 7475, pp. 1–32
[Di Francescomarino et al. 2018] Chiara Di Francescomarino, Chiara Ghidini, Fabrizio Maria Maggi, Fredrik Milani: Predictive Process
Monitoring Methods: Which One Suits Me Best? BPM 2018: 462-479
[Dulac-Arnold et al. 2015] Gabriel Dulac-Arnold, Richard Evans, Peter Sunehag, Ben Coppin: Reinforcement Learning in Large Discrete
Action Spaces. CoRR abs/1512.07679 (2015)
[Evermann et al. 2017] Evermann, J., Rehse, J., Fettke, P.: Predicting process behaviour using deep learning. Decision Support Systems
100, 2017
[Filho & Porter, 2017] Filho, R.V.R., Porter, B.: Defining emergent software using continuous self-assembly, perception, and learning.
TAAS 12(3), 16:1–16:25 (2017)
[Jamshidi et al., 2015] P. Jamshidi, A. Molzam Sharifloo, C. Pahl, A. Metzger, and G. Estrada, “Self-learning cloud controllers: Fuzzy Q-
learning for knowledge evolution (short paper),” in Int’l Conference on Cloud and Autonomic Computing (IC- CAC 2015) Cambridge,
USA, September 21-24, 2015,
[Kephart & Chess, 2003] J. O. Kephart and D. M. Chess, “The vision of autonomic computing,” IEEE Computer, vol. 36, no. 1, pp. 41–50,
2003.
[Klein et al. 2014] C. Klein, M. Maggio, K. Arzen, F. Hernandez-Rodriguez, “Brownout: building more robust cloud applications”. In:
36th Intl Conf. on Software Engineering (ICSE 2014), pp. 700–711. ACM, 2014
[Mann, 2016] Z. Mann, “Interplay of virtual machine selection and virtual machine placement”, in: 5th European Conf. on Service-
Oriented and Cloud Computing, ESOCC’16, LNCS vol. 9846, pp. 137–151 (2016)
[Metzger & Pohl, 2014] A. Metzger, K. Pohl, “Software product line engineering and variability management: Achievements and
challenges,” in ICSE Future of Software Engineering Track (FOSE 2014), ACM, 2014, pp. 70–84.
28

Referenzen
[Metzger et al. 2019] A. Metzger, A. Neubauer, P. Bohn, and K. Pohl, “Proactive process adaptation using deep learning ensembles,” in
31st Int’l Conf. on Advanced Information Systems Engineering (CAiSE 2019), LNCS, vol. 11483. Springer, 2019, pp. 547–562
[Metzger et al. 2020] A. Metzger, C. Quinton, Z. Á. Mann, L. Baresi, K. Pohl, “Realizing Self-Adaptive Systems via Online Reinforcement
Learning and Feature-Model-guided Exploration”, Computing, Springer, March, 2022
[Metzger et al. 2020a] A. Metzger, C. Quinton, Z. Mann, L. Baresi, and K. Pohl, “Feature model-guided online reinforcement learning
for self-adaptive services,” in 18th Int’l Conf. on Service-Oriented Computing (ICSOC 2020), LNCS 12571, Springer, 2020
[Metzger et al. 2020b] A. Metzger, T. Kley, and A. Palm, “Triggering proactive business process adaptations via online reinforcement
learning,” in 18th Int’l Conf. on Business Process Management (BPM 2020), LNCS 12168. Springer, 2020e
[Metzger, 2021] Workshop on Software in Electronics, Components and Systems-based Digitisation, Virtual, May, 2021, “Software
Engineering for ECS: Towards Dev-Ops-Adapt” (presentation slides)
[Palm et al. 2020] A. Palm, A. Metzger, and K. Pohl, “Online reinforcement learning for self-adaptive information systems,” in 32nd Int’l
Conf. on Advanced Information Systems Engineering (CAiSE 2020), LNCS 12127. Springer, 2020
[Salehie & Tahvildari, 2009] M. Salehie and L. Tahvildari, “Self-adaptive software: Landscape and research challenges,” TAAS, vol. 4, no.
2, 2009.
29

Referenzen
[Siegmund et al. 2012] N. Siegmund, S. Kolesnikov, C. Kästner, S. Apel, D. Batory, M. Rosenmüller, G. Saake, G.: Predicting Performance
via Automated Feature-interaction Detection. In: 34th Intl Conf. on Software Engineering (ICSE 2012), pp. 167–177, ACM, 2012
[Sutton & Barto, 2018] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA, USA: MIT Press,
2018
[Taylor & Stone, 2009] M. Taylor, P. Stone: Transfer learning for reinforcement learning domains: A survey. J. Mach. Learn. Res. 10,
1633–1685 (2009)
[Wang et al., 2020] Hongbing Wang, Jiajie Li, Qi Yu, Tianjing Hong, Jia Yan, Wei Zhao: Integrating recurrent neural networks and
reinforcement learning for dynamic service composition. Future Gener. Comput. Syst. 107: 551-563 (2020)
[Weyns et al. 2013] Danny Weyns, Nelly Bencomo, Radu Calinescu, Javier Cámara, Carlo Ghezzi, Vincenzo Grassi, Lars Grunske, Paola
Inverardi, Jean-Marc Jézéquel, Sam Malek, Raffaela Mirandola, Marco Mori, Giordano Tamburrelli: Perpetual Assurances for Self-
Adaptive Systems. Software Engineering for Self-Adaptive Systems 2013: 31-63
[Weyns, 2021] Danny Weyns, Introduction to Self-Adaptive Systems: A Contemporary Software Engineering Perspective, Wiley, 2021.
[Xu et al., 2012] C. Xu, J. Rao, and X. Bu, “URL: A unified reinforcement learning approach for autonomic cloud management,” J.
Parallel Distrib. Comput., vol. 72, no. 2, pp. 95–105, 2012
30

Antrittsvorlesung - APL.pptx

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de Andreas Metzger

Mais de Andreas Metzger (14)

Antrittsvorlesung - APL.pptx

Notas do Editor