Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Presentation
1. Introduction
Stochastic Hybrid Systems
A learning Approach
Discussion & Future Work
A Learning Approach to Verification and Control of
Stochastic Hybrid Systems
Literature Colloquium
Sofie Haesaert, DCSC
Supervisors: dr. ir. A. Abate and prof. dr. R. Babuˇka
s
May 28, 2012
Honeywell.com
1 / 25 Verification & Control of SHS
2. Introduction
Stochastic Hybrid Systems
A learning Approach
Discussion & Future Work
Outline
Introduction
Air Traffic Safety and Control
Applications
Stochastic Hybrid Systems
Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
Verification and Control
A learning Approach
Current Methods: Dynamic Programming
Related Work
Discussion & Future Work
2 / 25 Verification & Control of SHS
3. Introduction
Stochastic Hybrid Systems Air Traffic Safety and Control
A learning Approach Applications
Discussion & Future Work
Air Traffic Safety and Control
Flight Safety
Avoid: other airplanes, bad weather conditions, restricted
airspace,...
Reach end destination
Analysis based on air traffic model
Hybrid state space
Stochastic due to wind and turbulence disturbance of flight
path
Safety → Probabilistic
[J.Hu,2003]
3 / 25 Verification & Control of SHS
4. Introduction
Stochastic Hybrid Systems Air Traffic Safety and Control
A learning Approach Applications
Discussion & Future Work
Introduction
Other Applications
Systems Biology
DNA replication
HIV Treatment
Industrial Robotics
Pick-and-place tasks
...
[A. Singh,2010]
4 / 25 Verification & Control of SHS
5. Introduction
Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
A learning Approach
Verification and Control
Discussion & Future Work
Discrete-Time Stochastic Hybrid Systems
Hybrid state space S = q∈Q {q} × Rn(q)
Stochastic transitions
Transition kernels in discrete time for
Discrete transitions Tq
Reset transition Tr
Continuous transitions Tx
Controlled / Autonomous
Control of transitions, either continuous or finite action space
Policy = string of controls
⇒ Lots of variations in definition of SHS e.g. initial states vs
initial subsets.
[A. Abate,2008]
5 / 25 Verification & Control of SHS
6. Introduction
Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
A learning Approach
Verification and Control
Discussion & Future Work
Reachability Analysis
K
s0
Determine if a given SHS will reach a certain target set K within a
time horizon [0, N], starting from a set of initial states s0 . N can
either be finite or infinite.
6 / 25 Verification & Control of SHS
7. Introduction
Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
A learning Approach
Verification and Control
Discussion & Future Work
Reach-Avoid Problem
A K
s0
Determine the probability (rs0 ) that given an initial state s0 the
SHS will reach a certain target set K within a time horizon [0, N]
while staying inside the safe set A.
7 / 25 Verification & Control of SHS
8. Introduction
Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
A learning Approach
Verification and Control
Discussion & Future Work
j = First hitting time of target set K
Reach-Avoid trajectory: A K
j ≤N
State trajectory stays in safe set A s0
until j,
j−1
rs0 = Es0 1AK (si ) 1K (sj )
j∈[0,N] i=0
1, if sk ∈ K
Indicator function 1K (sk ) =
0, otherwise
[S. Summers,2010]
8 / 25 Verification & Control of SHS
9. Introduction
Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
A learning Approach
Verification and Control
Discussion & Future Work
j = First hitting time of target set K
Reach-Avoid trajectory: A K
j ≤N
State trajectory stays in safe set A s0
until j,
j−1
rs0 = Es0 1AK (si ) 1K (sj )
j∈[0,N] i=0
1, if sk ∈ K
Indicator function 1K (sk ) =
0, otherwise
[S. Summers,2010]
8 / 25 Verification & Control of SHS
10. Introduction
Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
A learning Approach
Verification and Control
Discussion & Future Work
j = First hitting time of target set K
Reach-Avoid trajectory: A K
j ≤N
State trajectory stays in safe set A s0
until j,
j−1
rs0 = Es0 1AK (si ) 1K (sj )
j∈[0,N] i=0
1, if sk ∈ K
Indicator function 1K (sk ) =
0, otherwise
[S. Summers,2010]
8 / 25 Verification & Control of SHS
11. Introduction
Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
A learning Approach
Verification and Control
Discussion & Future Work
j = First hitting time of target set K
Reach-Avoid trajectory: A K
j ≤N
State trajectory stays in safe set A s0
until j,
j−1
rs0 = Es0 1AK (si ) 1K (sj )
j∈[0,N] i=0
1, if sk ∈ K
Indicator function 1K (sk ) =
0, otherwise
[S. Summers,2010]
8 / 25 Verification & Control of SHS
12. Introduction
Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
A learning Approach
Verification and Control
Discussion & Future Work
Verification: Find the probability associated to a reach-avoid
problem
j−1
rs0 = Es0 1AK (si ) 1K (sj )
j∈[0,N] i=0
Control: Find a Policy π that maximizes rsπ
0
j−1
sup rsπ = sup Esπ
0 0
1AK (si ) 1K (sj )
π π
j∈[0,N] i=0
[S. Summers,2010]
9 / 25 Verification & Control of SHS
13. Introduction
Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
A learning Approach
Verification and Control
Discussion & Future Work
Dynamic Programming (1/2)
Define Value function Vk : S → [0, 1]
j−1
Vk (s) = Es 1AK (si ) 1K (sj )
j∈[k,N] i=k
Then it follows that V0 (s0 ) = rs0 .
10 / 25 Verification & Control of SHS
14. Introduction
Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
A learning Approach
Verification and Control
Discussion & Future Work
Dynamic Programming (2/2)
Verification: For k = 0, . . . N, iterate
Vk (s) = 1K (s) + 1AK (s)Es [Vk+1 ] , ∀s ∈ S
With VN (s) = 1K (s) ∀s ∈ S. Then V0 (s0 ) = rs0 .
Control: Maximization at every iteration step
∗
Vk (s) = sup 1K (s) + 1AK (s)Esπ [Vk+1 ] ,
π
∀s ∈ S
π
11 / 25 Verification & Control of SHS
15. Introduction
Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
A learning Approach
Verification and Control
Discussion & Future Work
Computational Issues
Recursion often cannot be written out analytically
→ Approximation: Vk ∼ Vk ˆ
Curse of Dimensionality
Difference between exact solution and approximation
Approximations of value function and/or policy include:
Discrete approximation: Discretization of action-state space
Functional approximation over action-state space
12 / 25 Verification & Control of SHS
16. Introduction
Stochastic Hybrid Systems Current Methods: Dynamic Programming
A learning Approach Related Work
Discussion & Future Work
Current Methods: Dynamic Programming
Discretization: Partitioning of state space
Finite action space and hybrid state
space
⇓
Markov Chain
(= finite action-state process)
ˆ
Vk = Tabular form
Figure: Discretization of
Hybrid State Space (S)
[A. Abate, 2007]
13 / 25 Verification & Control of SHS
17. Introduction
Stochastic Hybrid Systems Current Methods: Dynamic Programming
A learning Approach Related Work
Discussion & Future Work
+ Error Bounds
− Partitioning method
− Curse of Dimensionality: bad scaling towards higher
dimensions
Goal
Less conservative error bounds
Optimal partitioning
Action partitioning
Functional approximation
14 / 25 Verification & Control of SHS
18. Introduction
Stochastic Hybrid Systems Current Methods: Dynamic Programming
A learning Approach Related Work
Discussion & Future Work
Current Methods: Dynamic Programming
Functional Approximation:
SHS with finite action space
h
ˆ q
Vk (s, θk ) = θi,k φi (x), s = (q, x) ∈ S
i
q q
Parameter Vector θk = (θk 1 , . . . , θk m )
q q q
for each discrete mode q: θk = (θ1,k , . . . , θh,k )
[A. Abate, 2008]
15 / 25 Verification & Control of SHS
19. Introduction
Stochastic Hybrid Systems Current Methods: Dynamic Programming
A learning Approach Related Work
Discussion & Future Work
Functional Approximation:
− Only applied on Safety problems
− Only applied on finite action spaces
− No Error Bounds (yet)
+ Curse of Dimensionality: better scaling qualities
16 / 25 Verification & Control of SHS
20. Introduction
Stochastic Hybrid Systems Current Methods: Dynamic Programming
A learning Approach Related Work
Discussion & Future Work
A Learning Approach: Related Work on Discounted Return
Problems
N
Control objective: maxπ Jπ (x) = maxπ Esπ0
k
k=0 γ rk
With rk the reward at k and γ ∈ [0, 1) the discount factor.
Model Free
Samples (sk , a, sk+1 , rk )
Approximation methods for continuous state spaces
Most methods for N → ∞
e.g. (Approximate) Q-learning, LSPI, actor-critic, ...
17 / 25 Verification & Control of SHS
21. Introduction
Stochastic Hybrid Systems Current Methods: Dynamic Programming
A learning Approach Related Work
Discussion & Future Work
Fitted Value Iteration (1/2)
1. Collect samples (sk , a, sk+1 , rk ) at M states:
si , i = 1, . . . , M
2. Estimate value-function at M states :
˜
Vk (si ), i = 1, . . . , N
3. Fit value function to Vk ˜
ˆ ˜
Vk = fit(Vk )
4. k ← k − 1, go to 2
[R. Munos, 2008]
18 / 25 Verification & Control of SHS
22. Introduction
Stochastic Hybrid Systems Current Methods: Dynamic Programming
A learning Approach Related Work
Discussion & Future Work
Fitted Value Iteration (2/2)
Finite action space & continuous state space
Monte-Carlo approximations
Probabilistic error bounds on the value functions
∼ descriptive power of approximation functions
∼ limited number of samples in Monte-Carlo
approximations
Extension/Variations available for
Samples usage
Action-value function : Q(s, a)
Continuous states and actions
19 / 25 Verification & Control of SHS
23. Introduction
Stochastic Hybrid Systems
A learning Approach
Discussion & Future Work
Plan of Work
1. Control Synthesis: Fitted Value Iteration for SHS with
Finite control Space
Finite Horizon N
Batch samples
Kernel Based approximation
2. Finite Horizon Error Bounds
3. Infinite Horizon Error Bounds
4. Extensions: tree-based fitted Q-iteration, continuous action,
Infinite Horizon
20 / 25 Verification & Control of SHS
24. Introduction
Stochastic Hybrid Systems
A learning Approach
Discussion & Future Work
Other research lines:
Functional approximation after discretization
LSPI for N → ∞
...
21 / 25 Verification & Control of SHS
25. Introduction
Stochastic Hybrid Systems
A learning Approach
Discussion & Future Work
Thank you for your time
Are there any questions?
22 / 25 Verification & Control of SHS
26. Discounted Return vs Reach-avoid
FVI: Formulas
Appendix Slides
Discounted Return vs Reach-avoid
FVI: Formulas
23 / 25 Verification & Control of SHS
27. Discounted Return vs Reach-avoid
FVI: Formulas
discounted return
Vk (s) = rk + γEs [Vk+1 ] ∀s ∈ S
reach-avoid
Vk (s) = 1K (s) + 1AK (s)Es [Vk+1 ] ∀s ∈ S
24 / 25 Verification & Control of SHS
28. Discounted Return vs Reach-avoid
FVI: Formulas
˜
1. Estimate value-function at M states Vk (si ), i = 1, . . . , N
ˆ
For a given function Vk−1 , and using samples the monte-carlo
estimate V˜ of T (Vk−1 ) can be determined at the M base
ˆ
points as follows
h
1
˜
V (si ) = max rjsi ,a + γVk (sk+1 )
si ,a
a∈A h
j=1
˜
2. Fit value function to Vk
M
p
ˆ
Vk+1 = arg min ˜
f (si ) − V (si )
f ∈F
i=1
25 / 25 Verification & Control of SHS