VIRUSES structure and classification ppt by Dr.Prince C P
On the Convergence of Single-Call Stochastic Extra-Gradient Methods
1. On the Convergence of Single-Call Stochastic Extra-Gradient Methods
Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos
NeurIPS, December 2019
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 0 / 22
2. Outline:
1 Variational Inequality
2 Extra-Gradient
3 Single-call Extra-Gradient [Main Focus]
4 Conclusion
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 1 / 22
4. Variational Inequality
Introduction: Variational Inequalities in Machine Learning
Generative adversarial network (GAN)
min
θ
max
φ
Ex∼pdata
[log(Dφ(x))] + Ez∼pZ
[log(1 − Dφ(Gθ(z)))].
More min-max (saddle point) problems: distributionally robust learning, primal-dual
formulation in optimization, . . .
Seach of equilibrium: games, multi-agent reinforcement learning, . . .
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 2 / 22
5. Variational Inequality
Definition
Stampacchia variational inequality
Find x⋆
∈ X such that ⟨V (x⋆
),x − x⋆
⟩ ≥ 0 for all x ∈ X. (SVI)
Minty variational inequality
Find x⋆
∈ X such that ⟨V (x),x − x⋆
⟩ ≥ 0 for all x ∈ X. (MVI)
With closed convex set X ⊆ Rd
and vector field V Rd
→ Rd
.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 3 / 22
6. Variational Inequality
Illustration
SVI: V (x⋆
) belongs to the dual cone DC(x⋆
) of X at x⋆
[ local ]
MVI: V (x) forms an acute angle with tangent vector x − x⋆
∈ TC(x⋆
) [ global ]
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 4 / 22
7. Variational Inequality
Example: Function Minimization
min
x
f(x)
subject to x ∈ X
f X → R differentiable function to minimize.
Let V = ∇f.
(SVI) ∀x ∈ X, ⟨∇f(x⋆
),x − x⋆
⟩ ≥ 0 [ first-order optimality ]
(MVI) ∀x ∈ X, ⟨∇f(x),x − x⋆
⟩ ≥ 0 [x⋆
is a minimizer of f ]
If f is convex, (SVI) and (MVI) are equivalent.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 5 / 22
8. Variational Inequality
Example: Saddle point Problem
Find x⋆
= (θ⋆
,φ⋆
) such that
L(θ⋆
,φ) ≤ L(θ⋆
,φ⋆
) ≤ L(θ,φ⋆
) for all θ ∈ Θ and all φ ∈ Φ.
X ≡ Θ × Φ and L X → R differentiable function.
Let V = (∇θL,−∇φL).
(SVI) ∀(θ,φ) ∈ X, ⟨∇θL(x⋆
),θ − θ⋆
⟩ − ⟨∇φL(x⋆
),φ − φ⋆
⟩ ≥ 0 [ stationary ]
(MVI) ∀(θ,φ) ∈ X, ⟨∇θL(x),θ − θ⋆
⟩ − ⟨∇φL(x),φ − φ⋆
⟩ ≥ 0 [ saddle point ]
If L is convex-concave, (SVI) and (MVI) are equivalent.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 6 / 22
9. Variational Inequality
Monoticity
The solutions of (SVI) and (MVI) coincide when V is continuous and monotone, i.e.,
⟨V (x′
) − V (x),x′
− x⟩ ≥ 0 for all x,x′
∈ Rd
.
In the above two examples, this corresponds to either f being convex or L being
convex-concave.
The operator analogue of strong convexity is strong monoticity
⟨V (x′
) − V (x),x′
− x⟩ ≥ α x′
− x 2
for some α > 0 and all x,x′
∈ Rd
.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 7 / 22
11. Extra-Gradient
From Forward-backward to Extra-Gradient
Forward-backward
Xt+1 = ΠX (Xt − γtV (Xt)) (FB)
Extra-Gradient [Korpelevich 1976]
Xt+ 1
2
= ΠX (Xt − γtV (Xt))
Xt+1 = ΠX (Xt − γtV (Xt+ 1
2
))
(EG)
The Extra-Gradient method anticipates the landscape of V by taking an extrapolation step
to reach the leading state Xt+ 1
2
.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 8 / 22
12. Extra-Gradient
From Forward-backward to Extra-Gradient
Forward-backward does not converge in bilinear games, while Extra-Gradient does.
min
θ∈R
max
φ∈R
θφ
Left:
Forward-backward
Right:
Extra-Gradient
4 2 0 2 4 6
6
4
2
0
2
4
6
0.5 0.0 0.5 1.0
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 9 / 22
13. Extra-Gradient
Stochastic Oracle
If a stochastic oracle is involved:
Xt+1
2
= ΠX (Xt − γt
ˆVt)
Xt+1 = ΠX (Xt − γt
ˆVt+1
2
)
With ˆVt = V (Xt) + Zt satisfying (and same for ˆVt+ 1
2
)
a) Zero-mean: E[Zt Ft] = 0.
b) Bounded variance: E[ Zt
2
Ft] ≤ σ2
.
(Ft)t∈N/2 is the natural filtration associated to the stochastic process (Xt)t∈N/2.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 10 / 22
14. Extra-Gradient
Convergence Metrics
Ergodic convergence: restricted error function
ErrR(ˆx) = max
x∈XR
⟨V (x), ˆx − x⟩,
where XR ≡ X ∩ BR(0) = {x ∈ X x ≤ R}.
Last iterate convergence: squared distance dist(ˆx,X⋆
)2
.
Lemma [Nesterov 2007]
Assume V is monotone. If x⋆
is a solution of (SVI), we have ErrR(x⋆
) = 0 for all
sufficiently large R. Conversely, if ErrR(ˆx) = 0 for large enough R > 0 and some ˆx ∈ XR,
then ˆx is a solution of (SVI).
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 11 / 22
15. Extra-Gradient
Literature Review
We further suppose that V is β-Lipschitz.
Convergence type Hypothesis
Korpelevich 1976 Last iterate asymptotic Pseudo monotone
Tseng 1995 Last iterate geometric
Monotone + error bound
(e.g., strongly monotone, affine)
Nemirovski 2004 Ergodic in O(1/t) Monotone
Juditsky et al. 2011 Ergodic in O(1/
√
t) Stochastic monotone
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 12 / 22
16. Extra-Gradient
In Deep Learning
Extra-Gradient (EG) needs two oracle calls per iteration, while gradient computations can
be very costly for deep models:
And if we drop one oracle call per iteration?
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 13 / 22
17. Single-call Extra-Gradient [Main Focus]
Single-call Extra-Gradient
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 13 / 22
19. Single-call Extra-Gradient [Main Focus]
A First Result
Proxy
PEG: [Step 1] ˆVt ← ˆVt− 1
2
RG: [Step 1] ˆVt ← (Xt−1 − Xt)/γt; no projection
OG: [Step 1] ˆVt ← ˆVt− 1
2
[Step 2] Xt ← Xt+ 1
2
+ γt
ˆVt− 1
2
; no projection
Proposition
Suppose that the Single-call Extra-Gradient (1-EG) methods presented above share the
same initialization, X0 = X1 ∈ X, ˆV1/2 = 0 and a same constant step-size (γt)t∈N ≡ γ. If
X = Rd
, the generated iterates Xt coincide for all t ≥ 1.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 15 / 22
Xt+1
2
= ΠX (Xt − γt
ˆVt)
Xt+1 = ΠX (Xt − γt
ˆVt+ 1
2
)
20. Single-call Extra-Gradient [Main Focus]
Global Convergence Rate
Always with Lipschitz continuity.
Stochastic strongly monotone: step size in O(1/t).
New results!
Monotone Strongly Monotone
Ergodic Last Iterate Ergodic Last Iterate
Deterministic 1/t Unknown 1/t e−ρt
Stochastic 1/
√
t Unknown 1/t 1/t
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 16 / 22
21. Single-call Extra-Gradient [Main Focus]
Proof Ingredients
Descent Lemma [Deterministic + Monotone]
There exists (µt)t∈N ∈ RN
+ such that for all p ∈ X,
Xt+1 − p 2
+ µt+1 ≤ Xt − p 2
− 2γ⟨V (Xt+1
2
),Xt+ 1
2
− p⟩ + µt.
Descent Lemma [Stochastic + Strongly Monotone]
Let x⋆
be the unique solution of (SVI). There exists (µt)t∈N ∈ RN
+ ,M ∈ R+ such that
E[ Xt+1 − x⋆ 2
] + µt+1 ≤ (1 − αγt)(E[ Xt − x⋆ 2
] + µt) + Mγ2
t σ2
.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 17 / 22
22. Single-call Extra-Gradient [Main Focus]
Regular Solution
Definition [Regular Solution]
We say that x⋆
is a regular solution of (SVI) if V is C1
-smooth in a neighborhood of x⋆
and the Jacobian JacV (x⋆
) is positive-definite along rays emanating from x⋆
, i.e.,
z⊺
JacV (x⋆
)z ≡
d
∑
i,j=1
zi
∂Vi
∂xj
(x⋆
)zj > 0 for all z ∈ Rd
∖{0} that are tangent to X at x⋆
.
To be compared with
positive definiteness of the Hessian along qualified constraints in minimization;
differential equilibrium in games.
Localization of strong monoticity.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 18 / 22
23. Single-call Extra-Gradient [Main Focus]
Local Convergence
Theorem [Local convergence for stochastic non-monotone operators]
Let x⋆
be a regular solution of (SVI) and fix a tolerance level δ > 0. Suppose (PEG) is run
with step-sizes of the form γt = γ/(t + b) for large enough γ and b. Then:
(a) There are neighborhoods U and U1 of x⋆
in X such that, if X1/2 ∈ U,X1 ∈ U1, the
event
E∞ = {Xt+ 1
2
∈ U for all t = 1,2,...}
occurs with probability at least 1 − δ.
(b) Conditioning on the above, we have:
E[ Xt − x⋆ 2
E∞] = O (
1
t
).
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 19 / 22
26. Conclusion
Conclusion
Single-call rates ∼ Two-call rates.
Localization of stochastic guarantee.
Last iterate convergence: a first step to the non-monotone world.
Some research directions: Bregman, universal, . . .
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 21 / 22
27. Conclusion
Bibliography
Daskalakis, Constantinos et al. (2018). “Training GANs with optimism”. In: ICLR ’18: Proceedings of the
2018 International Conference on Learning Representations.
Juditsky, Anatoli, Arkadi Semen Nemirovski, and Claire Tauvel (2011). “Solving variational inequalities with
stochastic mirror-prox algorithm”. In: Stochastic Systems 1.1, pp. 17–58.
Korpelevich, G. M. (1976). “The extragradient method for finding saddle points and other problems”. In:
Èkonom. i Mat. Metody 12, pp. 747–756.
Malitsky, Yura (2015). “Projected reflected gradient methods for monotone variational inequalities”. In:
SIAM Journal on Optimization 25.1, pp. 502–520.
Nemirovski, Arkadi Semen (2004). “Prox-method with rate of convergence O(1/t) for variational inequalities
with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems”. In:
SIAM Journal on Optimization 15.1, pp. 229–251.
Nesterov, Yurii (2007). “Dual extrapolation and its applications to solving variational inequalities and related
problems”. In: Mathematical Programming 109.2, pp. 319–344.
Popov, Leonid Denisovich (1980). “A modification of the Arrow–Hurwicz method for search of saddle
points”. In: Mathematical Notes of the Academy of Sciences of the USSR 28.5, pp. 845–848.
Tseng, Paul (June 1995). “On linear convergence of iterative methods for the variational inequality problem”.
In: Journal of Computational and Applied Mathematics 60.1-2, pp. 237–252.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 22 / 22