Presentation

Introduction
Stochastic Hybrid Systems
A learning Approach
Discussion & Future Work

A Learning Approach to Verification and Control of
Literature Colloquium
Sofie Haesaert, DCSC
Supervisors: dr. ir. A. Abate and prof. dr. R. Babuˇka
s
May 28, 2012

Honeywell.com

1 / 25 Verification & Control of SHS

Introduction
A learning Approach

Outline
Introduction
Air Traﬃc Safety and Control
Applications

Discrete-time Stochastic Hybrid Systems
Stochastic Hybrid Systems: Properties
Veriﬁcation and Control

A learning Approach
Current Methods: Dynamic Programming
Related Work



Introduction
Stochastic Hybrid Systems Air Traffic Safety and Control
A learning Approach Applications

Air Traffic Safety and Control

Flight Safety
Avoid: other airplanes, bad weather conditions, restricted
airspace,...
Reach end destination

Analysis based on air traffic model
Hybrid state space
Stochastic due to wind and turbulence disturbance of flight
path

Safety → Probabilistic
[J.Hu,2003]

Introduction
Stochastic Hybrid Systems Air Traﬃc Safety and Control
A learning Approach Applications

Introduction

Other Applications
Systems Biology
DNA replication
HIV Treatment
Industrial Robotics
Pick-and-place tasks
...

[A. Singh,2010]

Introduction
A learning Approach

Discrete-Time Stochastic Hybrid Systems

Hybrid state space S = q∈Q {q} × Rn(q)
Stochastic transitions
Transition kernels in discrete time for
Discrete transitions Tq
Reset transition Tr
Continuous transitions Tx
Controlled / Autonomous
Control of transitions, either continuous or ﬁnite action space
Policy = string of controls
⇒ Lots of variations in deﬁnition of SHS e.g. initial states vs
initial subsets.
[A. Abate,2008]

Introduction
A learning Approach

Reachability Analysis

K

s0

Determine if a given SHS will reach a certain target set K within a
time horizon [0, N], starting from a set of initial states s0 . N can
either be ﬁnite or inﬁnite.


Introduction
A learning Approach

Reach-Avoid Problem

A K

s0

Determine the probability (rs0 ) that given an initial state s0 the
SHS will reach a certain target set K within a time horizon [0, N]
while staying inside the safe set A.


Introduction
A learning Approach

j = First hitting time of target set K
Reach-Avoid trajectory: A K
j ≤N
State trajectory stays in safe set A s0
until j,
 
j−1
rs0 = Es0  1AK (si ) 1K (sj )
j∈[0,N] i=0

1, if sk ∈ K
Indicator function 1K (sk ) =
0, otherwise

[S. Summers,2010]

Introduction
A learning Approach

Veriﬁcation: Find the probability associated to a reach-avoid
problem

 
j−1
rs0 = Es0  1AK (si ) 1K (sj )
j∈[0,N] i=0

Control: Find a Policy π that maximizes rsπ
0

 
j−1
sup rsπ = sup Esπ 
0 0
1AK (si ) 1K (sj )
π π
j∈[0,N] i=0

[S. Summers,2010]

Introduction
A learning Approach

Dynamic Programming (1/2)

Deﬁne Value function Vk : S → [0, 1]
 
j−1
Vk (s) = Es  1AK (si ) 1K (sj )
j∈[k,N] i=k

Then it follows that V0 (s0 ) = rs0 .


Introduction
A learning Approach

Dynamic Programming (2/2)

Veriﬁcation: For k = 0, . . . N, iterate

Vk (s) = 1K (s) + 1AK (s)Es [Vk+1 ] , ∀s ∈ S

With VN (s) = 1K (s) ∀s ∈ S. Then V0 (s0 ) = rs0 .

Control: Maximization at every iteration step
∗
Vk (s) = sup 1K (s) + 1AK (s)Esπ [Vk+1 ] ,
π
∀s ∈ S
π


Introduction
A learning Approach

Computational Issues

Recursion often cannot be written out analytically
→ Approximation: Vk ∼ Vk ˆ

Curse of Dimensionality
Diﬀerence between exact solution and approximation
Approximations of value function and/or policy include:
Discrete approximation: Discretization of action-state space
Functional approximation over action-state space


Introduction
Stochastic Hybrid Systems Current Methods: Dynamic Programming
A learning Approach Related Work


Discretization: Partitioning of state space

Finite action space and hybrid state
space
⇓
Markov Chain
(= ﬁnite action-state process)
ˆ
Vk = Tabular form
Figure: Discretization of
Hybrid State Space (S)

[A. Abate, 2007]

Introduction

+ Error Bounds
− Partitioning method
− Curse of Dimensionality: bad scaling towards higher
dimensions
Goal
Less conservative error bounds
Optimal partitioning
Action partitioning
Functional approximation


Introduction


Functional Approximation:
SHS with ﬁnite action space
h
ˆ q
Vk (s, θk ) = θi,k φi (x), s = (q, x) ∈ S
i

q q
Parameter Vector θk = (θk 1 , . . . , θk m )
q q q
for each discrete mode q: θk = (θ1,k , . . . , θh,k )

[A. Abate, 2008]

Introduction

Functional Approximation:

− Only applied on Safety problems
− Only applied on ﬁnite action spaces
− No Error Bounds (yet)
+ Curse of Dimensionality: better scaling qualities


Introduction

A Learning Approach: Related Work on Discounted Return
Problems

N
Control objective: maxπ Jπ (x) = maxπ Esπ0
k
k=0 γ rk
With rk the reward at k and γ ∈ [0, 1) the discount factor.
Model Free
Samples (sk , a, sk+1 , rk )
Approximation methods for continuous state spaces
Most methods for N → ∞
e.g. (Approximate) Q-learning, LSPI, actor-critic, ...


Introduction

Fitted Value Iteration (1/2)

1. Collect samples (sk , a, sk+1 , rk ) at M states:
si , i = 1, . . . , M
2. Estimate value-function at M states :
˜
Vk (si ), i = 1, . . . , N
3. Fit value function to Vk ˜

ˆ ˜
Vk = ﬁt(Vk )

4. k ← k − 1, go to 2

[R. Munos, 2008]

Introduction

Fitted Value Iteration (2/2)

Finite action space & continuous state space
Monte-Carlo approximations
Probabilistic error bounds on the value functions
∼ descriptive power of approximation functions
∼ limited number of samples in Monte-Carlo
approximations
Extension/Variations available for
Samples usage
Action-value function : Q(s, a)
Continuous states and actions


Introduction
A learning Approach

Plan of Work

1. Control Synthesis: Fitted Value Iteration for SHS with
Finite control Space
Finite Horizon N
Batch samples
Kernel Based approximation
2. Finite Horizon Error Bounds
3. Infinite Horizon Error Bounds
4. Extensions: tree-based fitted Q-iteration, continuous action,
Infinite Horizon


Introduction
A learning Approach

Other research lines:
Functional approximation after discretization
LSPI for N → ∞
...


Introduction
A learning Approach

Thank you for your time
Are there any questions?


Discounted Return vs Reach-avoid
FVI: Formulas

Appendix Slides


FVI: Formulas


FVI: Formulas

discounted return

Vk (s) = rk + γEs [Vk+1 ] ∀s ∈ S

reach-avoid

Vk (s) = 1K (s) + 1AK (s)Es [Vk+1 ] ∀s ∈ S


FVI: Formulas

˜
1. Estimate value-function at M states Vk (si ), i = 1, . . . , N
ˆ
For a given function Vk−1 , and using samples the monte-carlo
estimate V˜ of T (Vk−1 ) can be determined at the M base
ˆ
points as follows
h
1
˜
V (si ) = max rjsi ,a + γVk (sk+1 )
si ,a
a∈A h
j=1

˜
2. Fit value function to Vk
M
p
ˆ
Vk+1 = arg min ˜
f (si ) − V (si )
f ∈F
i=1


Presentation

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)

Presentation