3. Motivation
we saw that for general function, finding solution is extremely hard
Q: what is we restrict to convex problems only?
2/24
4. Motivation
we saw that for general function, finding solution is extremely hard
Q: what is we restrict to convex problems only?
we analyzed uniform grid method
Q: what could we do better if minx f(x) would be a convex problem?
2/24
5. Brainstorming...
Assumption
We assume that f : Rd → R is
convex and L Lipschitz continuous,
|f(x) − f(y)| ≤ Lkx − yk ∀x, y
source: towardsdatascience.com
Q: How would you define optimization algorithm
that can use only zero-order oracle?
3/24
8. Directional direct-search methods
1: pick initial solution x0 ∈ Rd
2: choose step-size η ∈ R+
3: pick set of directions D ⊂ Rd (e.g., D = {±ei, i ∈ [d] := {1, 2, . . . , d}})
4: for k = 0, . . . , do
5: if ∃dj ∈ D such that f(xk + ηdj) produces sufficient improvement then
6: xk+1 := xk + ηdj
7: (optional) increase η = ηγu with γu 1
8: else
9: adjust η = ηγd with γd ∈ (0, 1)
10: xk+1 = xk
11: end if
12: end for
6/24
19. Polynomial Models
Q: could we pick n points {xi}n
i=1, eval-
uate the function values {P(xi)}n
i=1 and
fit a polynomial m(x) that will interpo-
late the points?
Q: How many points would we need for
the example here?
17/24
20. Polynomial Models - Simplified Setting
pick a monomial basis φ(x) = [1, x1, . . . , xd, x2
1, . . . , x2
d]T
polynomial model m(x) =
P
i αiφi(x) for some αis
Q: How could we find the αis?
18/24
21. Polynomial Models - Simplified Setting
pick a monomial basis φ(x) = [1, x1, . . . , xd, x2
1, . . . , x2
d]T
polynomial model m(x) =
P
i αiφi(x) for some αis
Q: How could we find the αis?
Using n points {(xi, yi = f(xi)}n
i=1 we want α to satisfy m(xi) = yi ∀i
18/24
22. Polynomial Models - Simplified Setting
pick a monomial basis φ(x) = [1, x1, . . . , xd, x2
1, . . . , x2
d]T
polynomial model m(x) =
P
i αiφi(x) for some αis
Q: How could we find the αis?
Using n points {(xi, yi = f(xi)}n
i=1 we want α to satisfy m(xi) = yi ∀i
α1φ1(x1) + α2φ2(x1) + . . . αnφn(x1) = y1 (1)
.
.
. =
.
.
. (2)
α1φ1(xi) + α2φ2(xi) + . . . αnφn(xi) = yi (3)
.
.
. =
.
.
. (4)
α1φ1(xn) + α2φ2(xn) + . . . αnφn(xn) = yn (5)
18/24
23. Polynomial Models - Simplified Setting
pick a monomial basis φ(x) = [1, x1, . . . , xd, x2
1, . . . , x2
d]T
polynomial model m(x) =
P
i αiφi(x) for some αis
Q: How could we find the αis?
Using n points {(xi, yi = f(xi)}n
i=1 we want α to satisfy m(xi) = yi ∀i
α1φ1(x1) + α2φ2(x1) + . . . αnφn(x1) = y1 (1)
.
.
. =
.
.
. (2)
α1φ1(xi) + α2φ2(xi) + . . . αnφn(xi) = yi (3)
.
.
. =
.
.
. (4)
α1φ1(xn) + α2φ2(xn) + . . . αnφn(xn) = yn (5)
[φ(x1), . . . , φ(xn)]T
α = [y1, . . . , yn]T
18/24
24. Polynomial Models - Simplified Setting
pick a monomial basis φ(x) = [1, x1, . . . , xd, x2
1, . . . , x2
d]T
polynomial model m(x) =
P
i αiφi(x) for some αis
Q: How could we find the αis?
Using n points {(xi, yi = f(xi)}n
i=1 we want α to satisfy m(xi) = yi ∀i
α1φ1(x1) + α2φ2(x1) + . . . αnφn(x1) = y1 (1)
.
.
. =
.
.
. (2)
α1φ1(xi) + α2φ2(xi) + . . . αnφn(xi) = yi (3)
.
.
. =
.
.
. (4)
α1φ1(xn) + α2φ2(xn) + . . . αnφn(xn) = yn (5)
[φ(x1), . . . , φ(xn)]T
α = [y1, . . . , yn]T
⇒ Least squares!
[Colab: 00_DFO.ipynb]
18/24
25. Polynomial Models - Simplified Setting
after we build a model m(x) ≈ f(x) we can minimize it!
19/24
26. Polynomial Models - Simplified Setting
after we build a model m(x) ≈ f(x) we can minimize it!
the minimum of m(x) will define a new point xn+1 and we can evaluate f(xn+1)
Q: what should we do next? – any suggestions are welcome!
[back to Colab]
19/24
28. Motivation
for any analytic function f(x) we know that Taylor approximation is locally good
for given point xk and radius ∆ we have that ∀x ∈ B(xk, ∆) it holds
f(x) ≈ f(xk) + h∇f(xk), x − xki +
1
2
(x − xk)T
∇2
f(xk)(x − xk)
21/24
29. Motivation
for any analytic function f(x) we know that Taylor approximation is locally good
for given point xk and radius ∆ we have that ∀x ∈ B(xk, ∆) it holds
f(x) ≈ f(xk) + h∇f(xk), x − xki +
1
2
(x − xk)T
∇2
f(xk)(x − xk)
given n = (d + 1)(d + 2)/2 points we can pick polynomial model
m(x) =
X
i
αiφi(x)
with
φ(x) = [1, x1, . . . , xd, x2
1, . . . , x2
d, x1x2, . . . , xd−1xd]T
21/24
30. Trust-Region Radius ∆
we build model mk(x) using only points in B(xk, ∆)
we may need to evaluate f(x) on multiple new points that we may sample from B(xk, ∆)
let sk ∈ B(0, ∆) is such that it minimize model mk(xk + s)
Q: what do we expect about following quantity?
ρk =
f(xk) − f(xk + s)
mk(xk) − mk(xk + s)
22/24
31. Trust-Region Radius ∆
we build model mk(x) using only points in B(xk, ∆)
we may need to evaluate f(x) on multiple new points that we may sample from B(xk, ∆)
let sk ∈ B(0, ∆) is such that it minimize model mk(xk + s)
Q: what do we expect about following quantity?
ρk =
f(xk) − f(xk + s)
mk(xk) − mk(xk + s)
if ρk η1 (with 0 η1 1) we can trust the model and we can increase ∆
otherwise we have multiple options - improve model mk(x) or even decrease ∆
22/24
33. Bibliography
This lecture was based on [LMW19]
Jeffrey Larson, Matt Menickelly, and Stefan M Wild.
Derivative-free optimization methods.
Acta Numerica, 28:287–404, 2019.
24/24