Derivative free optimization

MTH702 Optimization
Nonlinear Optimization

Derivative Free Optimization
Zero-Order Oracle

Motivation
we saw that for general function, finding solution is extremely hard
Q: what is we restrict to convex problems only?
2/24

Motivation
we saw that for general function, finding solution is extremely hard
Q: what is we restrict to convex problems only?
we analyzed uniform grid method
Q: what could we do better if minx f(x) would be a convex problem?
2/24

Brainstorming...
Assumption
We assume that f : Rd → R is
convex and L Lipschitz continuous,
|f(x) − f(y)| ≤ Lkx − yk ∀x, y
source: towardsdatascience.com
Q: How would you define optimization algorithm
that can use only zero-order oracle?
3/24

Brainstorming...
4/24

Directional direct-search methods
1: pick initial solution x0 ∈ Rd
2: choose step-size η ∈ R+
3: pick set of directions D ⊂ Rd (e.g., D = {±ei, i ∈ [d] := {1, 2, . . . , d}})
4: for k = 0, . . . , do
5: if ∃dj ∈ D such that f(xk + ηdj) produces sufficient improvement then
6: xk+1 := xk + ηdj
7: (optional) increase η = ηγu with γu 1
8: else
9: adjust η = ηγd with γd ∈ (0, 1)
10: xk+1 = xk
11: end if
12: end for
6/24

7/24

https://en.wikipedia.org/wiki/File:Direct_search_BROYDEN.gif
8/24

9/24

10/24

11/24

12/24

13/24

14/24

15/24

Polynomial Models
Q: could we pick n points {xi}n
i=1, eval-
uate the function values {P(xi)}n
i=1 and
fit a polynomial m(x) that will interpo-
late the points?
Q: How many points would we need for
the example here?
17/24

Polynomial Models - Simplified Setting
pick a monomial basis φ(x) = [1, x1, . . . , xd, x2
1, . . . , x2
d]T
polynomial model m(x) =
P
i αiφi(x) for some αis
Q: How could we find the αis?
18/24

1, . . . , x2
d]T
P
Using n points {(xi, yi = f(xi)}n
i=1 we want α to satisfy m(xi) = yi ∀i
18/24

1, . . . , x2
d]T
P
α1φ1(x1) + α2φ2(x1) + . . . αnφn(x1) = y1 (1)
.
.
. =
.
.
. (2)
α1φ1(xi) + α2φ2(xi) + . . . αnφn(xi) = yi (3)
.
.
. =
.
.
. (4)
α1φ1(xn) + α2φ2(xn) + . . . αnφn(xn) = yn (5)
18/24

1, . . . , x2
d]T
P
α1φ1(x1) + α2φ2(x1) + . . . αnφn(x1) = y1 (1)
.
.
. =
.
.
. (2)
.
.
. =
.
.
. (4)
[φ(x1), . . . , φ(xn)]T
α = [y1, . . . , yn]T
18/24

1, . . . , x2
d]T
P
α1φ1(x1) + α2φ2(x1) + . . . αnφn(x1) = y1 (1)
.
.
. =
.
.
. (2)
.
.
. =
.
.
. (4)
[φ(x1), . . . , φ(xn)]T
α = [y1, . . . , yn]T
⇒ Least squares!
[Colab: 00_DFO.ipynb]
18/24

after we build a model m(x) ≈ f(x) we can minimize it!
19/24

after we build a model m(x) ≈ f(x) we can minimize it!
the minimum of m(x) will define a new point xn+1 and we can evaluate f(xn+1)
Q: what should we do next? – any suggestions are welcome!
[back to Colab]
19/24

Derivative-free
model-based
trust-region method

Motivation
for any analytic function f(x) we know that Taylor approximation is locally good
for given point xk and radius ∆ we have that ∀x ∈ B(xk, ∆) it holds
f(x) ≈ f(xk) + h∇f(xk), x − xki +
1
2
(x − xk)T
∇2
f(xk)(x − xk)
21/24

Motivation
for any analytic function f(x) we know that Taylor approximation is locally good
for given point xk and radius ∆ we have that ∀x ∈ B(xk, ∆) it holds
f(x) ≈ f(xk) + h∇f(xk), x − xki +
1
2
(x − xk)T
∇2
f(xk)(x − xk)
given n = (d + 1)(d + 2)/2 points we can pick polynomial model
m(x) =
X
i
αiφi(x)
with
φ(x) = [1, x1, . . . , xd, x2
1, . . . , x2
d, x1x2, . . . , xd−1xd]T
21/24

Trust-Region Radius ∆
we build model mk(x) using only points in B(xk, ∆)
we may need to evaluate f(x) on multiple new points that we may sample from B(xk, ∆)
let sk ∈ B(0, ∆) is such that it minimize model mk(xk + s)
Q: what do we expect about following quantity?
ρk =
f(xk) − f(xk + s)
mk(xk) − mk(xk + s)
22/24

Trust-Region Radius ∆
we build model mk(x) using only points in B(xk, ∆)
we may need to evaluate f(x) on multiple new points that we may sample from B(xk, ∆)
let sk ∈ B(0, ∆) is such that it minimize model mk(xk + s)
Q: what do we expect about following quantity?
ρk =
f(xk) − f(xk + s)
mk(xk) − mk(xk + s)
if ρk η1 (with 0 η1 1) we can trust the model and we can increase ∆
otherwise we have multiple options - improve model mk(x) or even decrease ∆
22/24

Bibliography
This lecture was based on [LMW19]
Jeffrey Larson, Matt Menickelly, and Stefan M Wild.
Derivative-free optimization methods.
Acta Numerica, 28:287–404, 2019.
24/24

mbzuai.ac.ae
Mohamed bin Zayed
University of Artificial Intelligence
Masdar City
Abu Dhabi
United Arab Emirates

Derivative free optimization

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Derivative free optimization

Semelhante a Derivative free optimization (20)

Último

Último (20)

Derivative free optimization