My PhD defence

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS
Jialin LIU
advised by: Olivier Teytaud & Marc Schoenauer
TAO, Inria, Univ. Paris-Saclay, UMR CNRS 8623, France
December 11, 2015
1 / 76

Motivation
Motivation
Why noisy optimization (i.e. optim. in front of a stochastic model) ?
Not that many works on noisy optimization
faults in networks: you can not use an average over 50 years
(many lines would be 100% guaranteed) ⇒ you need a (stochastic) model of faults
Why adversarial (i.e. worst case) problems ?
Critical problems with uncertainties
(technological breakthroughs, CO2 penalization ...)
Why portfolio (i.e. combining/selecting solvers) ?
Great in combinatorial optimization → let us generalize :)
Why MCTS ?
Great recent tool
Still many things to do
All related ?
All applicable to games
All applicable to power systems
Nash ⇒ mixed strategy portfolio
2 / 76

Noisy Optimization
Optimization criteria for black-box noisy optimization
1 Motivation
2 Noisy Optimization
Optimization methods
Resampling methods
Pairing
3 Portfolio and noisy optimization
Portfolio: state of the art
Relationship between portfolio and noisy optimization
Portfolio of noisy optimization methods
Conclusion
4 Adversarial portfolio
Adversarial bandit
Adversarial Framework
State-of-the-art
Contribution for computing Nash Equilibrium
Sparsity: sparse NE can be computed faster
Parameter-free adversarial bandit for large-scale problems
Application to robust optimization (power systems)
Application to games
Conclusion
5 Conclusion
3 / 76

Noisy Optimization
Black-box Noisy Optimization Framework
f : x → f(x, ω)
from a domain D ⊂ Rd
→ Continuous optimization
to R
with random variable ω.
Goal
x∗
= argmin
x∈Rd
Eωf(x, ω)
i.e. access to independent evaluations of f.
Black-Box case:
→ do not use any internal property of f
→ access to f(x) only, not f(x)
→ for a given x: randomly samples ω and returns f(x, ω)
→ for its nth
request, returns f(x, ωn)
x −→ −→ f(x, ω)
4 / 76

Noisy Optimization
Optimization criteria: State-of-the-art
Noise-free case: log-linear convergence [Auger, 2005, Rechenberg, 1973]
log ||xn − x∗
||
n
∼ A < 0 (1)
Noisy case: log-log convergence [Fabian, 1967]
log ||xn − x∗
||
log(n)
∼ A < 0 (2)
Figure: y-axis: log ||xn − x∗||, x-axis:#eval for log-linear convergence in noise-free case or
log #eval for log-log convergence in noisy case.
5 / 76

Noisy Optimization
Optimization criteria: Convergence rates
Slopes for Uniform Rate, Simple Regret1
[Bubeck et al., 2011] and Cumulative Regret
x∗
: the optimum of f
xn: the nth
evaluated search point
˜xn: the optimum estimated after nth
evaluation
Uniform Rate URn = ||xn − x∗
|| → all search points matter
Simple Regret SRn = Eωf(˜xn, ω) − Eωf(x∗
, ω) → ﬁnal recommendation matters
Cumulative Regret CRn =
j≤n
(Eωf(xj , ω) − Eωf(x∗
, ω)) → all recommendations matter
Convergence rates:
Slope(UR) = lim sup
n→∞
log(URn)
log(n)
(3)
Slope(SR) = lim sup
n→∞
log(SRn)
log(n)
(4)
Slope(CR) = lim sup
n→∞
log(CRn)
log(n)
. (5)
1
Simple Regret = difference between expected payoff recommended vs optimal. 6 / 76

Noisy Optimization
1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
7 / 76

Noisy Optimization
Tricks for handling noise:
Resampling: average multiple evaluations
Large population
Surrogate models
Speciﬁc methods (stochastic gradient descent with ﬁnite differences)
Here: focus on resampling
Resampling number: how many times do we resample noise ?
8 / 76

Noisy Optimization
Resampling methods: Non-adaptive resampling methods
[Recall] log-log convergence: log ||xn−x∗
||
log(n)
∼ A < 0, n is evaluation number
Non-adaptive rules:
Exponential rules with ad hoc parameters
⇒ log-log convergence (mathematically proved by us)
Other rules as a function of #iter: square root, linear rules, polynomial rules
Other rules as a function of #iter and dimension
9 / 76

Noisy Optimization
Resampling methods: Adaptive resampling methods
Adaptive rules: Bernstein [Mnih et al., 2008, Heidrich-Meisner and Igel, 2009]
Here:
FOR a pair of search points x, x to be compared DO
WHILE computation time is not elapsed DO
1000 resamplings for x and x
IF mean(difference) >> std THEN
break
ENDIF
ENDWHILE ENDFOR
10 / 76

Noisy Optimization
Resampling methods: Comparison
With Continuous Noisy Optimization (CNO)
With Evolution Strategies (ES)
With Differential Evolution (DE)
11 / 76

Noisy Optimization
Comparison with CNO
Continuous Noisy Optimization: we propose
Iterative Noisy Optimization Algorithm (INOA)
as a general framework for noisy optimization.
Key points:
Sampler which chooses a sampling around the current approximation,
Opt which updates the approximation of the optimum,
resampling number rn = B nβ
and sampling step-size σn = A/nα
Main application: ﬁnite differences sampling + quadratic model
12 / 76

Noisy Optimization
Comparison with CNO: State-of-the-art and our results
3 types of noise: constant, linear or quadratic as a function of the SR:
Var(f(x, ω)) = O [Eωf(x, ω) − Eωf(x∗
, ω)]
z
(6)
with z ∈ {0, 1, 2}.
z
optimized for CR optimized for SR
slope(SR) slope(CR) slope(SR) slope(CR)
0 (constant var) − 1
2
1
2
− 2
3 2
3[Fabian, 1967] [Dupaˇc, 1957]
[Shamir, 2013]
0 and −1
∞-differentiable [Fabian, 1967]
0 and “quadratic”
−1
[Dupaˇc, 1957]
1 (linear var)
−1
0 −1 0
[Rolet and Teytaud, 2010] [Rolet and Teytaud, 2010] [Rolet and Teytaud, 2010] [Rolet and Teytaud, 2010]
2 (quadratic var)
−∞ 0 −∞ 0
[Jebalia and Auger, 2008] [Jebalia and Auger, 2008] [Jebalia and Auger, 2008] [Jebalia and Auger, 2008]
Table: State-of-the-art: Convergence rates. Blue: existing results, we also achieved. Red: new
results by us.
Main application: ﬁnite differences sampling + quadratic model
Various (new, proved) rates depending on assumptions
Recovers existing rates (with a same algorithm) and beyond
13 / 76

Noisy Optimization
Comparison with CNO: Results & Discussion
Our proposed algorithm (provably) reaches the same rate
as Kiefer-Wolfowitz algorithm when the noise has constant variance
as Bernstein-races optimization algorithms when the noise variance decreases
linearly as a function of the simple regret
as Evolution Strategies when the noise variance decreases quadratically as a
function of the simple regret
⇒ no details here, focus on ES and DE.
14 / 76

Noisy Optimization
What about evolutionary algorithms ? Experiments with variance noise =
constant (hard case)
Algorithms:
ES + resampling
DE + resamplnig
Results: slope(SR) = −1
2
in both cases
(with e.g. rules depending on #iter and dimension)
5 10 15 20 25
5
10
15
N1.01exp
N1.1exp
N2exp
Nscale
Figure: Modiﬁed function F4 of CEC2005, dimension 2. x-axis: log(#eval); y-axis: log(SR).
15 / 76

Noisy Optimization
Resampling methods: Partial conclusion
Conclusion:
Adaptation of Newton’s algorithm for noisy fitness ( f and Hf approximated by
finite differences+resamplings)
→ leads to fast convergence rates + recovers many rates in one alg. + generic
framework (but no proved application besides quadratic surrogate model)
Non-adaptive methods lead to log-log convergence (math+xp) in ES
Nscale = d−2
exp( 4n
5d
) ok (slope(SR) = −1
2
) for both ES and DE
(nb: −1 possible with large mutation + small inheritance)
In progress:
Adaptive resampling methods might be merged with bounds on resampling
numbers ⇒ in progress, unclear benefit for the moment.
16 / 76

Noisy Optimization
1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
17 / 76

Noisy Optimization
Variance reduction techniques
Monte Carlo [Hammersley and Handscomb, 1964, Billingsley, 1986]
ˆEf(x, ω) =
1
n
n
i=1
f(x, ωi ) → Eωf(x, ω). (7)
Quasi Monte Carlo [Cranley and Patterson, 1976, Niederreiter, 1992,
Wang and Hickernell, 2000, Mascagni and Chi, 2004]
Use samples aimed at being as uniform as possible over the domain.
18 / 76

Noisy Optimization
Variance reduction techniques: white-box
Antithetic variates
Ensure some regularity of the sampling by using symmetries
ˆEωf(x, ω)=1
n
n/2
i=1 (f(x, ωi ) + f(x, −ωi )) .
Importance sampling
Instead of sampling ω with density dP, we sample ω with density dP
ˆEωf(x, ω)=1
n
n
i=1
dP(ωi )
dP (ωi )
f(x, ωi ).
Control variates
Instead of estimating Eωf(x, ω), we estimate Eω (f(x, ω) − g(x, ω))
using Eωf(x, ω) = Eωg(x, ω)
A
+Eω (f(x, ω) − g(x, ω))
B
.
19 / 76

Noisy Optimization
Variance reduction techniques: grey-box
Common random numbers (CRN) or pairing
Use the same samples ω1, . . . , ωn for all the population xn,1, . . . , xn,λ.
Seedn = {seedn,1, . . . , seedn,mn }.
Eωf(xn,k , ω) is then approximated as
1
mn
mn
i=1
f(xn,k , seedn,i ).
Different forms of pairing:
Seedn is the same for all n
mn increases and nested sets Seedn, i.e.
∀n, i ≤ mn, mn+1 ≥ mn, seedn,i = seedn+1,i
all individuals in an offspring use the same seeds,
+ seeds are 100% changed between offspring
20 / 76

Noisy Optimization
Pairing: Partial conclusion
No details, just our conclusion:
“almost” black-box
easy to implement
applicable for most applications
On the realistic problem, pairing provided a great improvement
But there are counterexamples in which it is detrimental.
21 / 76

Portfolio and noisy optimization
1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
22 / 76

Portfolio of optimization algorithms
Usually:
Portfolio → Combinatorial Optimization (SAT Competition)
Recently:
Portfolio → Continuous Optimization [Baudiˇs and Poˇs´ık, 2014]
This work:
Portfolio → Noisy Optimization
→ Portfolio = choosing, online, between several algorithms
23 / 76

1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
24 / 76

Why portfolio in Noisy Optimization?
Stochastic problem
limited budget (time or total number of evaluations)
target: anytime convergence to the optimum
black-box
2
How to choose a suitable solver?
2
Image from http://ethanclements.blogspot.fr/2010/12/postmodernism-essay-question.html
25 / 76

Why portfolio in Noisy Optimization?
Stochastic problem
limited budget (time or total number of evaluations)
target: anytime convergence to the optimum
black-box
2
How to choose a suitable solver?
Algorithm Portfolios:
Select automatically the best in a ﬁnite set of solvers
2
Image from http://ethanclements.blogspot.fr/2010/12/postmodernism-essay-question.html
25 / 76

1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
26 / 76

Portfolio of noisy optimization methods: proposal
A ﬁnite number of given noisy optimization solvers, “orthogonal”
Unfair distribution of budget
Information sharing (not very helpful here...)
→ Performs almost as well as the best solver
27 / 76

Portfolio of noisy optimization methods: NOPA
Algorithm 1 Noisy Optimization Portfolio Algorithm (NOPA).
1: Input noisy optimization solvers Solver1, Solver2 . . . , SolverM
2: Input a lag function LAG : N+
→ N+
3: Input a non-decreasing integer sequence r1, r2, . . . Periodic comparisons
4: Input a non-decreasing integer sequence s1, s2, . . . Number of resamplings
5: n ← 1 Number of selections
6: m ← 1 NOPA’s iteration number
7: i∗
← null Index of recommended solver
8: x∗
← null Recommendation
9: while budget is not exhausted do
10: if m ≥ rn then
11: i∗
= arg min
i∈{1,...,M}
ˆEsn [f(˜xi,LAG(rn))] Algorithm selection
12: n ← n + 1
13: else
14: for i ∈ {1, . . . , M} do
15: Apply one evaluation for Solveri
16: end for
17: m ← m + 1
18: end if
19: x∗
= ˜xi∗,m Update recommendation
20: end while 28 / 76

Portfolio of noisy optimization methods: compare solvers early
lag function:
LAG(n) ≤ n: lag
∀i ∈ {1, . . . , M}, xi,LAG(n) = or = xi,n
29 / 76

Portfolio of noisy optimization methods: compare solvers early
lag function:
LAG(n) ≤ n: lag
∀i ∈ {1, . . . , M}, xi,LAG(n) = or = xi,n
Why this lag ?
algorithms’ ranking is usually stable → no use comparing the very last
it’s much cheaper to compare old points:
comparing good (i.e. recent) points → comparing points with similar ﬁtness
comparing points with similar ﬁtness → very expensive
29 / 76

Portfolio of noisy optimization methods: Theorem with fair budget distribution
Theorem with fair budget distribution
Assume that
each solver i ∈ {1, . . . , M} has simple regret SRi,n = (1 + o(1)) Ci
nαi (as usual)
and noise variance = constant.
Then for some universal rn, sn, LAGn, a.s. there exists n0 such that, for n ≥ n0:
portfolio always chooses an optimal solver (optimal αi and Ci );
the portfolio uses ≤ M · rn(1 + o(1)) evaluations ⇒ M times more than the best
solver.
Interpretation
Negligible comparison budget (thanks to lag)
On classical log-log graphs, the portfolio should perform similarly to the best
solver, within the log(M) shift (proved)
30 / 76

INOPA: introducing an unfair budget
NOPA: same budget for all solvers.
Remark:
we compare old recommendations (LAGn << n)
they were known long ago, before spending all this budget
therefore, except selected solvers, most of the budget is wasted :(
⇒ Lazy evaluation paradigm: evaluate f(.) only when you need it for your output
⇒ Improved NOPA (INOPA): unfaired budget distribution
Use only LAG(rn) evaluations (negligible) on the sub-optimal solvers (INOPA)
log(M ) shift with M the number of optimal solvers (proved)
31 / 76

Experiments: Unimodal case
Noisy Optimization Algorithms (NOAs):
SA-ES: Self-Adaptive Evolution Strategy
Fabian’s algorithm: a first-order method using gradients estimated by finite
differences [Dvoretzky et al., 1956, Fabian, 1967]
Noisy Newton’s algorithm: a second-order method using a Hessian matrix
approximated also by finite differences (our contribution in CNO)
Solvers z = 0 (constant var) z = 1 (linear var) z = 2 (quadratic var)
RSAES .114 ± .002 .118 ± .003 .113 ± .003
Fabian1 −.838 ± .003 −1.011 ± .003 −1.016 ± .003
Fabian2 .108 ± .003 −1.339 ± .003 −2.481 ± .003
Newton −.070 ± .003 −.959 ± .092 −2.503 ± .285
NOPA no lag −.377 ± .048 −.978 ± .013 −2.106 ± .003
NOPA −.747 ± .003 −.937 ± .005 −2.515 ± .095
INOPA −.822 ± .003 −1.359 ± .027 −3.528 ± .144
Table: Slope(SR) for f(x) = ||x||2 + ||x||z N in dimension 15. Computation time = 40s.
32 / 76

Experiments: Stochastic unit commitment problem
Solver d = 45 d = 63 d = 105 d = 125
RSAES .485 ± .071 .870 ± .078 .550 ± .097 .274 ± .097
Fabian1 1.339 ± .043 1.895 ± .040 1.075 ± .047 .769 ± .047
Fabian2 .394 ± .058 .521 ± .083 .436 ± .097 .307 ± .097
Newton .749 ± .101 1.138 ± .128 .590 ± .147 .312 ± .147
INOPA .394 ± .059 .547 ± .080 .242 ± .101 .242 ± .101
Table: Stochastic unit commitment problem (minimization). Computation time = 320s.
What’s more:
Given a same budget, a INOPA of identical solvers can outperform its mono-solvers.
33 / 76

Conclusion
1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
34 / 76

Conclusion
Portfolio and noisy optimization: Conclusion
Main conclusion:
portfolios also great in noisy opt.
(because in noisy opt., with lag, comparison cost = small)
We show mathematically and empirically a log(M) shift when using M solvers, on
a classical log-log scale
Bound improved to log(M ) shift, with M = nb. of optimal solvers, with unfair
distribution of budget (INOPA)
35 / 76

Conclusion
Portfolio and noisy optimization: Conclusion
Main conclusion:
portfolios also great in noisy opt.
(because in noisy opt., with lag, comparison cost = small)
We show mathematically and empirically a log(M) shift when using M solvers, on
a classical log-log scale
Bound improved to log(M ) shift, with M = nb. of optimal solvers, with unfair
distribution of budget (INOPA)
Take-home messages
portfolio = little overhead
unfair budget = no overhead if “orthogonal” portfolio (orthogonal → M = 1)
We mathematically conﬁrmed the idea of orthogonality found in
[Samulowitz and Memisevic, 2007]
35 / 76

Adversarial portfolio
Adversarial bandit
1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
36 / 76

Adversarial bandit
Framework: Zero-sum matrix games
Game deﬁned by matrix M
I choose (privately) i
Simultaneously, you choose (privately) j
I earn Mi,j
You earn −Mi,j
So this is zero-sum.
Figure: 0-sum matrix game.
rock paper scissors
rock 0.5 0 1
paper 1 0.5 0
scissors 0 1 0.5
Table: Example of 1-sum matrix game:
Rock-paper-scissors.
37 / 76

Adversarial bandit
Framework: Nash Equilibrium (NE)
Deﬁnition (Nash Equilibrium)
Zero-sum matrix game M
My strategy = probability distrib. on rows = x
Your strategy = probability distrib. on cols = y
Expected reward = xT
My
There exists x∗
, y∗
such that ∀x, y,
xT
My∗
≤ x∗T
My∗
≤ x∗T
My. (8)
(x∗
, y∗
) is a Nash Equilibrium (no unicity).
Deﬁnition (Approximate -Nash Equilibria)
(x∗
, y∗
) such that
xT
My∗
− ≤ x∗T
My∗
≤ x∗T
My+ . (9)
Example: The NE of Rock-paper-scissors is unique: (1/3, 1/3, 1/3).
38 / 76

Adversarial bandit
1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
39 / 76

Adversarial bandit
Methods for computing Nash Equilibrium
Algorithm Complexity Exact solution? Conﬁdence Time
LP [von Stengel, 2002] O(Kα), α > 6 yes 1 constant
[Grigoriadis and Khachiyan, 1995] O(
K log(K)
2
) no 1 random
[Grigoriadis and Khachiyan, 1995]
O(
log2(K)
2
)
no 1 random
with K
log(K)
processors
EXP3 [Auer et al., 1995] O(
K log(K)
2
) no 1 − δ constant
Inf [Audibert and Bubeck, 2009] O(
K log(K)
2
) no 1 − δ constant
Our algorithm O(k3k K log K) yes 1 − δ constant
(if NE is k-sparse)
Table: State-of-the-art of computing Nash Equilibrium for ESMG MK×K .
40 / 76

Adversarial bandit
Adversarial bandit algorithm Exp3.P
Algorithm 2 Exp3.P: variant of Exp3. η and γ are two parameters.
1: Input η ∈ R how much the distribution becomes peaked
2: Input γ ∈ (0, 1] exploration rate
3: Input a time horizon (computational budget) T ∈ N+
and the number of arms K ∈ N+
4: Output a Nash-optimal policy p
5: y ← 0
6: for i ← 1 to K do initialization
7: ωi ← exp( ηγ
3
T
K
)
8: end for
9: for t ← 1 to T do
10: for i ← 1 to K do
11: pi ← (1 − γ)
ωi
K
j=1
ωj
+ γ
K
12: end for
13: Generate it according to (p1, p2, . . . , pK )
14: Compute reward Rit ,t
15: for i ← 1 to K do
16: if i == it then
17: ˆRi ←
Rit ,t
pi
18: else
19: ˆRi ← 0
20: end if
21: ωi ← ωi exp γ
3K
(ˆRi + η
pi
√
TK
)
22: end for
23: end for
24: Return probability distribution (p1, p2, . . . , pK )
41 / 76

1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
42 / 76

Sparse Nash Equilibria (1/2)
Considering x∗
a Nash-optimal policy for ZSMG MK×K :
Let us assume that x∗
is unique and has at most k non-zero components (sparsity).
Let us show that x∗
is “discrete”:
(Remark: Nash = solution of linear programming problem)
⇒ x∗
= also NE of a k × k submatrix: Mk×k
⇒ x∗
= solution of LP in dimension k
⇒ x∗
= solution of k lin. eq. with coefﬁcients in {−1, 0, 1}
⇒ x∗
= inv-matrix × vector
⇒ x∗
= obtained by “cofactors / det matrix”
⇒ x∗
has denominator at most k
k
2
By Hadamard determinant bound [Hadamard, 1893], [Brenner and Cummings, 1972]
43 / 76

Sparse Nash Equilibria (2/2)
Computation of sparse Nash Equilibria
Under assumption that the Nash is sparse:
x∗
is rational with “small” denominator (previous slide!)
So let us compute an -Nash (with small enough!) (sublinear time!)
And let us compute its closest approximation with “small denominator”
(Hadamard)
Two new algorithms for exact Nash:
Rounding-EXP3: switch to closest approximation
Truncation-EXP3: remove small components and work on the remaining
submatrix (exact solving)
(requested precision k−3k/2
only ⇒ compl. k3k
K log K)
44 / 76

1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
45 / 76

Our proposal: Parameter-free adversarial bandit
No details here; in short:
We compare various existing parametrizations of EXP3
We select the best
We add sparsity as follows:
for a budget of T rounds of EXP3, threshold = max
i∈{1,...,m}
(Txi )α
T
⇒ we get a parameter-free bandit for adversarial problems
46 / 76

1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
47 / 76

Scenarios
Policies
Simulator
Average perf.
Robustness
Average cost
technological berakthrough
CO2 penalization
Maintain a connection
Create new connection,...
Scenarios
Policy
R(k, s)
Examples of scenario: CO2 penalization, gas curtailment in Eastern Europe,
technological breakthrough
Examples of policy: massive nuclear power plant building, massive renewable
energies
48 / 76

Nash-planning for scenario-based decision making
Decision tools
METHOD
EXTRACTION
EXTRACTION
COMPUTATIONAL
INTERPRETATION
OF POLICIES
OF CRITICAL
COST
SCENARIOS
Wald One One per policy K × S
Nature decides later,
minimizing our reward
Savage One One per policy K × S
Nature decides later,
maximizing our regret
Scenarios Handcrafted Handcrafted K × S Human expertise
our proposal: Nash Nash-optimal Nash-optimal (K + S) × log(K + S)(∗) Nature decides
privately, before us
Table: Comparison between several tools for decision under uncertainty. K = |K| and S = |S|.
⇒ in this case sparsity performs very well. (*)improved if sparse, by our previous result!
Nash ⇒ fast selection of scenarios and options: sparsity both
fastens the NE computation and
makes the output more readable (smaller matrix)
49 / 76

Application to power investment problem: Testcase and parameterization
We consider (big toy problem):
310
investment policies (k)
39
scenarios (s)
reward: (k, s) → R(k, s)
50 / 76

Application to power investment problem: Testcase and parameterization
We consider (big toy problem):
310
investment policies (k)
39
scenarios (s)
reward: (k, s) → R(k, s)
We
use Nash Equilibria, for their principled nature (Nature decides ﬁrst and privately!
that’s reasonable, right ?) and low computational cost in large scale settings
compute the equilibria thanks to EXP3 (tuned)...
... with sparsity, for
improving the precision
reducing the number of pure strategies in our recommendation (unreadable matrix
otherwise!)
50 / 76

Application to power investment problem: Sparse-Nash algorithm
Algorithm 3 The Sparse-Nash algorithm for solving decision under uncertainty prob-
lems.
Input A family K of possible decisions k (investment policies).
Input A family S of scenarios s.
Input A mapping (k, s) → Rk,s, providing the rewards
Run truncated.Exp3.P on R, get
a probability distribution on K (support = key options) and
a probability distribution on S (support = critical scenarios).
Emphasize the policy with highest probability.
51 / 76

Application to power investment problem: Results
α
AVERAGE SPARSITY LEVEL OVER 310 = 59049 ARMS
T = K T = 10K T = 50K T = 100K T = 500K T = 1000K
0.1 13804 ± 52 non-sparse non-sparse non-sparse non-sparse non-sparse
0.3 2810 ± 59 non-sparse non-sparse non-sparse non-sparse non-sparse
0.5 396 ± 16 non-sparse non-sparse 59049 ± 197 49819 ± 195 non-sparse
0.7 43 ± 3 58925 ± 27 55383 ± 1507 46000 ± 278 9065 ± 160 non-sparse
0.9 4 ± 0 993 ± 64 797 ± 42 504 ± 25 98 ± 5 52633 ± 523
0.99 1 ± 0 2 ± 0 3 ± 0 2 ± 0 2 ± 0 7 ± 1
α
ROBUST SCORE: WORST REWARD AGAINST PURE STRATEGIES
T = K T = 10K T = 50K T = 100K T = 500K T = 1000K
NT 4.922e-01 4.928e-01 4.956e-01 4.991e-01 5.221e-01 4.938e-01
0.1 4.948e-01 4.928e-01 4.956e-01 4.991e-01 5.221e-01 4.938e-01
0.3 5.004e-01 4.928e-01 4.956e-01 4.991e-01 5.221e-01 4.938e-01
0.5 5.059e-01 4.928e-01 4.956e-01 4.991e-01 5.242e-01 4.938e-01
0.7 5.054e-01 4.928e-01 4.965e-01 5.031e-01 5.317e-01 4.938e-01
0.9 4.281e-01 5.137e-01 5.151e-01 5.140e-01 5.487e-01 4.960e-01
0.99 3.634e-01 4.357e-01 4.612e-01 4.683e-01 5.242e-01 5.390e-01
Pure 3.505e-01 3.946e-01 4.287e-01 4.489e-01 5.143e-01 4.837e-01
Table: Average sparsity level and robust score. α is the truncation parameter. T is the budget.
52 / 76

Application to power investment problem: summary
Deﬁne long term scenarios (plenty!) ?
Build simulator R(k, s)
Classical solution (Savage): min
k∈K
max
s∈S
regret(k, s)
Our proposal (Nash): automatically select submatrix
53 / 76

Application to power investment problem: summary
Deﬁne long term scenarios (plenty!) ?
Build simulator R(k, s)
Classical solution (Savage): min
k∈K
max
s∈S
regret(k, s)
Our proposal (Nash): automatically select submatrix
Our proposed tool has the following advantages:
Natural extraction of interesting policies and critical scenarios:
α = .7 provides stable (and proved) results,
but the extracted submatrix becomes easily readable (small enough) with larger
values of α.
Faster than Wald or Savage methodologies.
Take-home messages
We get a fast criterion, faster than Wald’s or Savage’s criteria, with a natural
interpretation, and more readable ⇒ but stochastic recommendation!
53 / 76

1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
54 / 76

Two parts:
Seeds matter: **choose** your seeds !
More tricky but worth the effort: position-speciﬁc seeds !
(towards a better asymptotic behavior of MCTS ?)
55 / 76

Optimizing random seeds: Correlations
Figure: Success rate per seed (ranked) in 5x5 Domineering, with standard deviations on y-axis:
the seed has a signiﬁcant impact.
Fact: the random seed matters !
56 / 76

Optimizing random seeds: State-of-the-art
Stochastic algorithms randomly select their pseudo-random seed.
We propose to choose the seed(s), and to combine them.
State-of-the-art for combining random seeds:
[Nagarajan et al., 2015] combines several AIs
[Gaudel et al., 2010] uses Nash methods for combining several opening books
[Saint-Pierre and Teytaud, 2014] constructs several AIs from a single stochastic
one and combines them by the BestSeed and Nash approaches
57 / 76

Trick: present results with one white seed per column and one black seed per
row
... ...
Column player gets 1-Mi,j
Row player
gets Mi,j Mi,j
M1,1
M2,1
MK,1
M1,2
M1,K
M2,2
...
... ...
...
...
...
...
...
...
MK,K
MK,2
M2,K
...
K random seed for White
KrandomseedsforBlack
... ...
...
Figure: One black seed per row, one white seed per column.
58 / 76

Propositions: Nash & BestSeed
Nash
Nash = combines rows (more robust; we will see later)
BestSeed
BestSeed = just pick up the best row / best column
59 / 76

Better than squared matrices: rectangle methods
Remark:
for choosing a row, if #rows = #cols, then #rows is more critical than #cols;
for a given budget, increase #rows and decrease #cols (same budget!)
K
K
Kt
x Kt Kt
Kt
Figure: Left: square matrix of a game; right: rectangles of a game (K >> Kt ).
60 / 76

Does it work ? experiments on Domineering
The opponent uses seeds which have never been used during the learning of the
portfolio (cross-validation).
Figure: Results for domineering, with the BestSeed (left) and the Nash (right) approach, against the baseline (K = 1) and the
exploiter ( K > 1; opponent who “learns” very well). Kt = 900 in all experiments.
BestSeed performs well against the original algorithm (K = 1), but poorly
against the exploiter ( K > 1).
Nash outperforms the original algorithm both w.r.t K = 1 (all cases) and K > 1
(most cases).
61 / 76

Beyond cross-validation: experiments with transfer in the game of Go
Learning: BestSeed is applied to GnuGo, with MCTS and a budget of 400
simulations.
Test: against “classical” GnuGo, i.e. the non-MCTS version of GnuGo.
Opponent Performance of BestSeed Performance with randomized seed
GnuGo-classical level 1 1. (± 0 ) .995 (± 0 )
GnuGo-classical level 4 1. (± 0 ) 1. (± 0 )
GnuGo-classical level 7 .73 (± .013 ) .061 (± .004 )
Table: Performance of “BestSeed” and “randomized seed” against “classical” GnuGo.
Previous slide: we win against the AI which we have trained (but different seeds!).
This slide: we improve the winning rate against another AI.
62 / 76

Optimizing random seeds: Partial conclusion
Conclusion:
Seed optimization (NOT position specific) = can be seen as a simple and
effective tool for building an opening book with no development effort, no human
expertise, no storage of database.
“Rectangle” provides significant improvements.
The online computational overhead of the methods is negligible.
The boosted AIs significantly outperform the baselines.
BestSeed performs well, but can be overfitted ⇒ strength of Nash.
Further work:
The use of online bandit algorithms for dynamically choosing K/Kt .
Note:
The BestSeed and the Nash algorithms are not new.
The algorithm and analysis of rectangles is new.
The analysis of the impact of seeds is new.
The applications to Domineering, Atari-go and Breakthfrough are new.
63 / 76

Two parts:
Seeds matter: **choose** your seeds !
More tricky but worth the effort: position-speciﬁc seeds !
(towards a better asymptotic behavior of MCTS ?)
64 / 76

Optimizing position-based random seeds: Tsumego
Tsumego (by Yoji Ojima, Zen’s author)
Input: a Go position
Question: is this situation a win for white ?
Output: yes or no
65 / 76

Optimizing position-based random seeds: Tsumego
Tsumego (by Yoji Ojima, Zen’s author)
Input: a Go position
Question: is this situation a win for white ?
Output: yes or no
Why so important?
At the heart of many game algorithms
In Go, Exptime complete [Robson, 1983]
65 / 76

Classical algorithms
Monte Carlo (MC)
[Bruegmann, 1993, Cazenave, 2006, Cazenave and Borsboom, 2007]
Monte Carlo Tree Search (MCTS) [Bouzy, 2004, Coulom, 2006]
Nested MC [Cazenave, 2009]
Voting scheme among MCTS [Gavin et al., ]
66 / 76

Classical algorithms
Monte Carlo (MC)
[Bruegmann, 1993, Cazenave, 2006, Cazenave and Borsboom, 2007]
Monte Carlo Tree Search (MCTS) [Bouzy, 2004, Coulom, 2006]
Nested MC [Cazenave, 2009]
Voting scheme among MCTS [Gavin et al., ]
⇒ here weighted voting scheme among MCTS
66 / 76

Evaluation of the game value
Algorithm 4 Evaluation of the game value.
1: Input current state s
2: Input a policy πB for Black, depending on a seed in N+
3: Input a policy πW for White, depending on a seed in N+
4: for i ∈ {1, . . . , K} do
5: for j ∈ {1, . . . , K} do
6: Mi,j ← outcome of the game starting in s with πB playing as Black with seed
b(i) and πW playing as White with seed w(j)
7: end for
8: end for
9: Compute weights p for Black and q for White for the matrix M (either BestSeed,
Nash, or other)
10: Return pT
Mq approximate value of the game M
67 / 76

Classical case (MC/MCTS): unpaired Monte Carlo averaging
b(1)
K*K random seeds for Black
b(i)b(2) b(K*K)... ...
w(1)
K*K random seeds for White
w(i)w(2) w(K*K)... ...
... ...
Column player gets 1-Mi,j
Row player
gets Mi,j Mi,j
M1,1
M2,1
MK,1
M1,2
M1,K
M2,2
...
... ...
...
...
...
...
...
...
MK,K
MK,2
M2,K
...
K random seed for White
KrandomseedsforBlack
... ...
...
Figure: Left: unpaired case (classical estimate by averaging); right: paired case: K seeds vs K
seeds.
68 / 76

Experiments: Applied methods and setting
Compared methods for approximating v(s)
Three methods use K2
indep. batches of M MCTS-simulations using matrix of
seeds:
Nash reweighting = Nash-value
BestSeed reweighting = Intersection best row / best col
Paired MC estimate = Average of the matrix
69 / 76

seeds:
One unpaired method: classical MC estimate (the average of K2
random
MCTS)
Baseline: a single long MCTS (=state of the art !)
→only one which is not K2
-parallel
69 / 76

seeds:
One unpaired method: classical MC estimate (the average of K2
random
MCTS)
Baseline: a single long MCTS (=state of the art !)
→only one which is not K2
-parallel
Parameter setting: GnuGo-MCTS [Bayer et al., 2008]
setting A: 1 000 simulations per move
setting B: 80 000 simulations per move
69 / 76

Experiments: Average results over 50 Tsumego problems
0 200 400 600 800 1000
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
Submatrix Size (N
2
)
Performance
Nash
Paired
Best
Unpaired
MCTS(1)
(a) setting A: 1 000 simulations per move.
0 200 400 600 800 1000
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
Submatrix Size (N
2
)
Performance
Nash
Paired
Best
Unpaired
MCTS(1)
(b) setting B: 80 000 simulations per move.
Figure: Average over 50 Tsumego problems. x-axis: #simulations, y-axis: %correct answers.
MCTS(1): one single MCTS run using all the budget.
Setting A (small budget): MCTS(1) outperforms weighted average of 81 MCTS
runs (but we are more parallel !)
Setting B (large budget): we outperform MCTS and all others by far
⇒ consistent with the limited scalability of MCTS for huge number of sim.
70 / 76

Optimizing position-based random seeds: Partial conclusion
Main conclusion:
novel way of evaluating game values using Nash Equilibrium
(theoretical validation & experiments on 50 Tsumego problems)
Nash or BestSeed predictor requires far less simulations for ﬁnding accurate
results + sometimes consistent whereas original MC is not !
71 / 76

Main conclusion:
We outperformed
average of MCTS runs sharing the budget
a single MCTS using all the budget
→ For M large enough, our weighted averaging of 81 single MCTS runs with M
simulations is better than a MCTS run with 81M simulations :)
71 / 76

Main conclusion:
We outperformed
average of MCTS runs sharing the budget
a single MCTS using all the budget
→ For M large enough, our weighted averaging of 81 single MCTS runs with M
simulations is better than a MCTS run with 81M simulations :)
Take-home messages
We classify positions (“black wins” vs “white wins”).
We use a WEIGHTED average of K2
MCTS runs of M simulations.
Our approach outperforms:
all tested voting schemes among K2
MCTS estimates of M simulations,
and a pure MCTS of K2
× M simulations,
when M is large and K2
= 81.
71 / 76

Conclusion
1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
72 / 76

Conclusion
A work on sparsity, at the core of ZSMG
A parameter-free adversarial bandit, obtained by tuning (no details provided in
this talk) + sparsity
Applications of ZSMG:
Nash + Sparsity → faster + more readable robust decision making
73 / 76

Conclusion
A work on sparsity, at the core of ZSMG
A parameter-free adversarial bandit, obtained by tuning (no details provided in
this talk) + sparsity
Applications of ZSMG:
Nash + Sparsity → faster + more readable robust decision making
Random seeds = new MCTS variants ?
validated as opening book learning (Go, Atari-Go, Domineering, Breakthrough,
Draughts,Phantom-Go. . . )
position-speciﬁc seeds validated on Tsumego
73 / 76

Conclusion
1 Motivation
Resampling methods
Pairing
Conclusion
Adversarial bandit
State-of-the-art
Conclusion
5 Conclusion
74 / 76

Conclusion
Conclusion & Further work
Noisy opt:
An algorithm, recovering most (but not all: Fabian’s rate!) existing results, extended to
other surrogate models
ES/DE with resamplings have good rates for linear/quad var, and/or robust criteria
(UR); for other cases resamplings are not sufﬁcient for optimal rates (“mutate large
inherit small” + huge population and/or surrogate models...)
75 / 76

Conclusion
Noisy opt:
Portfolio:
Application to noisy opt.; great beneﬁts with several solvers of a given model
Towards wider applications: portfolio of models ?
75 / 76

Conclusion
Noisy opt:
Portfolio:
Adversarial portfolio: successful use of sparsity; parameter-free bandits ?
75 / 76

Conclusion
Noisy opt:
Portfolio:
MCTS and seeds: room for 5 ph.D. ... if there is funding for it :-)
75 / 76

Conclusion
Noisy opt:
Portfolio:
MCTS and seeds: room for 5 ph.D. ... if there is funding for it :-)
Most works here → ROBUSTNESS by COMBINATION
(robust to solvers, to models, to parameters, to seeds ...)
75 / 76

Conclusion
Thanks for your attention !
Thanks to all the collaborators from Artelys, INRIA, CNRS, Univ.
Paris-Saclay, Univ. Paris-Dauphine, Univ. du Littoral, NDHU ...
76 / 76

References
Some references I
Audibert, J.-Y. and Bubeck, S. (2009).
Minimax policies for adversarial and stochastic bandits.
In proceedings of the Annual Conference on Learning Theory (COLT).
Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E. (1995).
Gambling in a rigged casino: the adversarial multi-armed bandit problem.
In Proceedings of the 36th Annual Symposium on Foundations of Computer
Science, pages 322–331. IEEE Computer Society Press, Los Alamitos, CA.
Auger, A. (2005).
Convergence results for the (1, λ)-sa-es using the theory of φ-irreducible markov
chains.
Theoretical Computer Science, 334(1):35–69.
Baudiˇs, P. and Poˇs´ık, P. (2014).
Online black-box algorithm portfolios for continuous optimization.
In Parallel Problem Solving from Nature–PPSN XIII, pages 40–49. Springer.
77 / 76

References
Some references II
Bayer, A., Bump, D., Daniel, E. B., Denholm, D., Dumonteil, J., Farneb¨ack, G.,
Pogonyshev, P., Traber, T., Urvoy, T., and Wallin, I. (2008).
Gnu go 3.8 documentation.
Technical report, Free Software Fundation.
Billingsley, P. (1986).
Probability and Measure.
John Wiley and Sons.
Bouzy, B. (2004).
Associating shallow and selective global tree search with Monte Carlo for 9x9 Go.
In 4rd Computer and Games Conference, Ramat-Gan.
Brenner, J. and Cummings, L. (1972).
The Hadamard maximum determinant problem.
Amer. Math. Monthly, 79:626–630.
Bruegmann, B. (1993).
Monte-carlo Go (unpublished draft
http://www.althofer.de/bruegmann-montecarlogo.pdf).
78 / 76

References
Some references III
Bubeck, S., Munos, R., and Stoltz, G. (2011).
Pure exploration in ﬁnitely-armed and continuous-armed bandits.
Theoretical Computer Science, 412(19):1832–1852.
Cazenave, T. (2006).
A phantom-go program.
In van den Herik, H. J., Hsu, S.-C., Hsu, T.-S., and Donkers, H. H. L. M., editors,
Proceedings of Advances in Computer Games, volume 4250 of Lecture Notes in
Computer Science, pages 120–125. Springer.
Cazenave, T. (2009).
Nested monte-carlo search.
In Boutilier, C., editor, IJCAI, pages 456–461.
Cazenave, T. and Borsboom, J. (2007).
Golois wins phantom go tournament.
ICGA Journal, 30(3):165–166.
79 / 76

References
Some references IV
Coulom, R. (2006).
Efﬁcient Selectivity and Backup Operators in Monte-Carlo Tree Search.
In P. Ciancarini and H. J. van den Herik, editors, Proceedings of the 5th
International Conference on Computers and Games, Turin, Italy, pages 72–83.
Cranley, R. and Patterson, T. (1976).
Randomization of number theoretic methods for multiple integration.
SIAM J. Numer. Anal., 13(6):904,1914.
Dupaˇc, V. (1957).
O Kiefer-Wolfowitzovˇe aproximaˇcn´ı Methodˇe.
ˇCasopis pro pˇestov´an´ı matematiky, 082(1):47–75.
Dvoretzky, A., Kiefer, J., and Wolfowitz, J. (1956).
Asymptotic minimax character of the sample distribution function and of the
classical multinomial estimator.
Annals of Mathematical Statistics, 33:642–669.
Fabian, V. (1967).
Stochastic Approximation of Minima with Improved Asymptotic Speed.
Annals of Mathematical statistics, 38:191–200.
80 / 76

References
Some references V
Gaudel, R., Hoock, J.-B., Pérez, J., Sokolovska, N., and Teytaud, O. (2010).
A Principled Method for Exploiting Opening Books.
In International Conference on Computers and Games, pages 136–144,
Kanazawa, Japon.
Gavin, C., Stewart, S., and Drake, P.
Result aggregation in root-parallelized computer go.
Grigoriadis, M. D. and Khachiyan, L. G. (1995).
A sublinear-time randomized approximation algorithm for matrix games.
Operations Research Letters, 18(2):53–58.
Hadamard, J. (1893).
Résolution d’une question relative aux déterminants.
Bull. Sci. Math., 17:240–246.
Hammersley, J. and Handscomb, D. (1964).
Monte carlo methods, methuen & co.
Ltd., London, page 40.
81 / 76

References
Some references VI
Heidrich-Meisner, V. and Igel, C. (2009).
Hoeffding and bernstein races for selecting policies in evolutionary direct policy
search.
In ICML ’09: Proceedings of the 26th Annual International Conference on
Machine Learning, pages 401–408, New York, NY, USA. ACM.
Jebalia, M. and Auger, A. (2008).
On multiplicative noise models for stochastic search.
In et a.l., G. R., editor, Conference on Parallel Problem Solving from Nature
(PPSN X), volume 5199, pages 52–61, Berlin, Heidelberg. Springer Verlag.
Liu, J., Saint-Pierre, D. L., Teytaud, O., et al. (2014).
A mathematically derived number of resamplings for noisy optimization.
In Genetic and Evolutionary Computation Conference (GECCO 2014).
Mascagni, M. and Chi, H. (2004).
On the scrambled halton sequence.
Monte-Carlo Methods Appl., 10(3):435–442.
82 / 76

References
Some references VII
Mnih, V., Szepesv´ari, C., and Audibert, J.-Y. (2008).
Empirical Bernstein stopping.
In ICML ’08: Proceedings of the 25th international conference on Machine
learning, pages 672–679, New York, NY, USA. ACM.
Nagarajan, V., Marcolino, L. S., and Tambe, M. (2015).
Every team deserves a second chance: Identifying when things go wrong
(student abstract version).
In 29th Conference on Artiﬁcial Intelligence (AAAI 2015), Texas, USA.
Niederreiter, H. (1992).
Random Number Generation and Quasi-Monte Carlo Methods.
Rechenberg, I. (1973).
Evolutionstrategie: Optimierung Technischer Systeme nach Prinzipien des
Biologischen Evolution.
Fromman-Holzboog Verlag, Stuttgart.
Robson, J. M. (1983).
The complexity of go.
In IFIP Congress, pages 413–417.
83 / 76

References
Some references VIII
Rolet, P. and Teytaud, O. (2010).
Adaptive noisy optimization.
In Di Chio, C., Cagnoni, S., Cotta, C., Ebner, M., Ek´art, A., Esparcia-Alcazar, A.,
Goh, C.-K., Merelo, J., Neri, F., Preuß, M., Togelius, J., and Yannakakis, G.,
editors, Applications of Evolutionary Computation, volume 6024 of Lecture Notes
in Computer Science, pages 592–601. Springer Berlin Heidelberg.
Saint-Pierre, D. L. and Teytaud, O. (2014).
Nash and the Bandit Approach for Adversarial Portfolios.
In CIG 2014 - Computational Intelligence in Games, Computational Intelligence
in Games, page 7, Dortmund, Germany. IEEE, IEEE.
Samulowitz, H. and Memisevic, R. (2007).
Learning to solve qbf.
In Proceedings of the 22nd National Conference on Artiﬁcial Intelligence, pages
255–260. AAAI.
Shamir, O. (2013).
On the complexity of bandit and derivative-free stochastic convex optimization.
In COLT 2013 - The 26th Annual Conference on Learning Theory, June 12-14,
2013, Princeton University, NJ, USA, pages 3–24.
84 / 76

References
Some references IX
Storn, R. (1996).
On the usage of differential evolution for function optimization.
In Fuzzy Information Processing Society, 1996. NAFIPS. 1996 Biennial
Conference of the North American, pages 519–523. IEEE.
von Stengel, B. (2002).
Computing equilibria for two-person games.
Handbook of Game Theory, 3:1723 – 1759.
Wang, X. and Hickernell, F. (2000).
Randomized halton sequences.
Math. Comput. Modelling, 32:887–899.
85 / 76

My PhD defence

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (7)

Semelhante a My PhD defence

Semelhante a My PhD defence (20)

Último

Último (18)

My PhD defence