2. Overview
• Optimal Transport and Imaging
• Convex Analysis and Proximal Calculus
• Forward Backward
• Douglas Rachford and ADMM
• Generalized Forward-Backward
• Primal-Dual Schemes
3. ork, Measure Preserving Maps
ica-
d ofDistributions µ0 , µ1 on Rk .
ase.
eeds
ans-
that
eme
rate
ance
eval
t al. µ0 µ1
4. ork, Measure Preserving Maps
ica-
d ofDistributions µ0 , µ1 on Rk .
ase.
eeds
Mass preserving map T : Rk Rk .
ans-
that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A))
1
eme
rate
ance x T (x)
eval
t al. µ0 µ1
5. ork, Measure Preserving Maps
ica-
d ofDistributions µ0 , µ1 on Rk .
ase.
eeds
Mass preserving map T : Rk Rk .
ans-
that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A))
1
eme
rate
ance x T (x)
eval
t al. µ0 µ1
Distributions with densities: µi = i (x)dx
T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x)
7. Optimal Transport
Lp optimal transport:
W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx)
T µ0 =µ1
Regularity condition:
µ0 or µ1 does not give mass to “small sets”.
Theorem (p > 1): there exists a unique optimal T .
T T
µ1
µ0
8. Optimal Transport
Lp optimal transport:
W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx)
T µ0 =µ1
Regularity condition:
µ0 or µ1 does not give mass to “small sets”.
Theorem (p > 1): there exists a unique optimal T .
Theorem (p = 2): T is defined as T = with convex.
T T T (x)
T (x ) T is monotone:
µ1 x T (x) T (x ), x x 0
µ0 x
9. Wasserstein Distance
µ
Couplings: µ, x
A Rd , ⇥(A Rd ) = µ(A) y
B Rd , ⇥(Rd B) = (B)
10. Wasserstein Distance
µ
Couplings: µ, x
A Rd , ⇥(A Rd ) = µ(A) y
B Rd , ⇥(Rd B) = (B)
Transportation cost:
Wp (µ, )p = min c(x, y)d⇥(x, y)
µ, Rd Rd
11. Wasserstein Distance
µ
Couplings: µ, x
A Rd , ⇥(A Rd ) = µ(A) y
B Rd , ⇥(Rd B) = (B)
Transportation cost:
Wp (µ, )p = min c(x, y)d⇥(x, y)
µ, Rd Rd
12. Optimal Transport
Let p > 1 and µ does not vanish on small sets.
Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y)
Rd Rd
Optimal transport T : Rd Rd :
µ
x
y
(x, T (x))
13. Optimal Transport
Let p > 1 and µ does not vanish on small sets.
Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y)
Rd Rd
Optimal transport T : Rd Rd :
µ
x
p = 2: T = unique solution of y
⇥ is convex l.s.c. (x, T (x))
( ⇥)⇤µ =
15. 1-D Continuous Wasserstein
Distributions µ, on R.
t
Cumulative functions: Cµ (t) = dµ(x)
For all p > 1: T =C 1
Cµ
T is non-decreasing (“change of contrast”)
Explicit formulas:
1 H
Wp (µ, )p = |Cµ 1 C 1 p
|
0
W1 (µ, ) = |Cµ C | = ||(Cµ C ) ⇥ H||1
R
17. Grayscale Histogram Transfer
f1
Input images: fi : [0, 1] 2
[0, 1], i = 0, 1.
Gray-value distributions: µi defined on [0, 1].
µi ([a, b]) = 1{a f b} (x)dx
[0,1]2
µ1
f0
µ0
18. Grayscale Histogram Transfer
f1
Input images: fi : [0, 1] 2
[0, 1], i = 0, 1.
Gray-value distributions: µi defined on [0, 1].
µi ([a, b]) = 1{a f b} (x)dx
[0,1]2
Optimal transport: T = Cµ11 Cµ0 . µ1
f0 Cµ0 (f0 ) T (f0 )
Cµ0 Cµ11
µ0 µ1
19. pplication to Color Transfer
Color Histogram Equalization
1
Input color images: fi RN 3 . projection iof= to style
Sliced Wasserstein
⇥ X
N x
fi (x)
image color statistics Y
Optimal transport framework Sliced Wasserstein projection Applications
Application to Color Transfer
Source image (X )
f1 f0
Sliced Wasserstein project
image color statistics Y
f0
Source image after color transfer
µ1 image (Y )
Style Source image (X )
µ0
J. Rabin Wasserstein Regularization
20. pplication to Color Transfer
Color Histogram Equalization
1
Input color images: fi RN 3 . projection iof= to style
Sliced Wasserstein
⇥ X
N x
fi (x)
image color statistics Y
Optimal assignement: min ||f0 f1 ⇥ ||
N
Optimal transport framework Sliced Wasserstein projection Applications
Application to Color Transfer
Source image (X )
f1 f0
Sliced Wasserstein project
image color statistics Y
f0
Source image after color transfer
µ1 image (Y )
Style Source image (X )
µ0
J. Rabin Wasserstein Regularization
21. pplication to Color Transfer
Color Histogram Equalization
1
Input color images: fi RN 3 . projection iof= to style
Sliced Wasserstein
⇥ X
N x
fi (x)
image color statistics Y
Optimal assignement: min ||f0 f1 ⇥ ||
N
Transport: T : f0 (x) R3 f1 ( (i)) R3
Optimal transport framework Sliced Wasserstein projection Applications
Application to Color Transfer
Source image (X )
f1 f0
Sliced Wasserstein project
image color statistics Y
f0
Source image after color transfer
µ1 image (Y )
Style Source image (X )
µ0
T
J. Rabin Wasserstein Regularization
22. pplication to Color Transfer
Color Histogram Equalization
1
Input color images: fi RN 3 . projection iof= to style
Sliced Wasserstein
⇥ X
N x
fi (x)
image color statistics Y
Optimal assignement: min ||f0 f1 ⇥ ||
N
Optimal transport framework Sliced Wasserstein projection Applications
Transport: T : f0 (x) R3Application to Color Transfer R3
f1 ( (i))
Optimal transport framework Sliced Wasserstein projection Applications
˜ Application to ColorfTransfer
Equalization:) f0 = T (f0 ) ˜ = f1
0 Sliced Wasserstein projection of X to sty
Source image (X image color statistics Y
f1 f0 T (f0 )
Sliced Wasserstein project
image color statistics Y
Source image (X )
T
f0
Source image after color transfer
µ1 image (Y )
Style Source image (X )
µ0 Source image after color transfer
µ1
Style image (Y )
T J. Rabin Wasserstein Regularization
J. Rabin Wasserstein Regularization
23. cðdvÞ ¼ l0> þ dvÞ detðrðv þ dvÞÞ À l1 ¼ 0:
ðv
can be thought as an elliptic system thought as anThe sys-system of equations. The trilinearRelaxation was performed for transferring
v cc
> tem cv cc can be of equations. elliptic a sys-
the GPU. We used cubic grid. interpolation used a trilineara parallelizable four-
the GPU. We operator using interpolation operator for transferring
Image Registration
Ittem isto verify that a correction for dv can be obtained by solving with an
is easy solved using preconditioned conjugate gradient color Gauss-Seidel relaxation scheme. Thisrestriction
s solved using preconditioned conjugate À1 gradient with an the coarse grid residual increases robustness
the coarse grid correction to fine grids. Thecorrection to fine grids. The residual restriction
the system dv % c> ðcv c> Þ cðvÞ (Nocedal and Wright, 1999) The sys- and efficiency and is especially suited for the implementation on
incomplete Cholesky preconditioner.
mplete Cholesky preconditioner. v v
operator for projecting residual from for projecting residual from the fine to coarse grids is
operator the fine to coarse grids is
tem c c> can be thought as an elliptic system of equations. The sys-
v c the GPU. We used a trilinear interpolation operator for transferring
tem is solved using preconditioned conjugate gradient with an the coarse grid correction to fine grids. The residual restriction
incomplete Cholesky preconditioner. operator for projecting residual from the fine to coarse grids is
T
[ur Rehman et al, 2009]
Fig. 6. OMT Results viewed on an axial slice. The top row shows corresponding slices from Pre-op(Left) and Post-op(Right) MRI data. The deformation is clearly visible in the
anterior part of the brain.
28. Numerical Examples
⇢0 ⇢1
con-
work,
plica-
ad of
ease.
peeds
rans-
t that
heme
erate
mance
ieval
et al.
t
Figure 7: Synthetic 2D examples on a Euclidean domain. The
29. Discrete Formulation
s
Centered grid formulation (d = 1):
min J(x) + ◆C (x)
x2RGc ⇥2
P
J(x) = i2Gc j(xi ) t
Centered grid Gc
30. Discrete Formulation
s
Centered grid formulation (d = 1):
min J(x) + ◆C (x)
x2RGc ⇥2
P
J(x) = i2Gc j(xi ) t
Staggered grid formulation : Centered grid Gc
min 2 J(I(x)) + ◆C (x) s
1
x2RGst ⇥RGst
t
Staggered grid
1 2
Gst Gst
31. Discrete Formulation
s
Centered grid formulation (d = 1):
min J(x) + ◆C (x)
x2RGc ⇥2
P
J(x) = i2Gc j(xi ) t
Staggered grid formulation : Centered grid Gc
min 2 J(I(x)) + ◆C (x) s
1
x2RGst ⇥RGst
Interpolation operator:
1 2
Gst Gst
1 2
I = (I , I ) : R ⇥R ! RG c
t
2I1 (m)i,j = mi+ 1 ,j + mi
2
1
2 ,j
Staggered grid
! Projection on div(x) = 0 using FFTs. 1 2
Gst Gst
32. SOCP Formulation
P
min J(x) + ◆C (x) J(x) = i2Gc j(xi )
x2RGc ⇥d
X
() min ri s.t. 8 i 2 Gc , (mi , ⇢i , ri ) 2 K
x2RGc ⇥d ,r2RGc
i
(Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2 ||m||2 6 ⇢r
˜ ˜ ˜ ˜ ˜˜
33. SOCP Formulation
P
min J(x) + ◆C (x) J(x) = i2Gc j(xi )
x2RGc ⇥d
X
() min ri s.t. 8 i 2 Gc , (mi , ⇢i , ri ) 2 K
x2RGc ⇥d ,r2RGc
i
(Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2 ||m||2 6 ⇢r
˜ ˜ ˜ ˜ ˜˜
Second order cone program:
! Use interior point methods (e.g. MOSEK software).
Linear convergence with iteration #.
Poor scaling with dimension |Gc |.
E cient for medium scale problems (N ⇠ 104 ).
34. 1
Example: Regularization
Inverse problem: measurements y = x0 + w
x0 y
35. 1
Example: Regularization
Inverse problem: measurements y = x0 + w
x0 y x?
argmin
Regularized inversion: x? 2 argmin 1 ||y
2 x||2 + R(x)
x2R N
Data fidelity Regularity
36. 1
Example: Regularization
Inverse problem: measurements y = x0 + w
x0 y x?
argmin
Regularized inversion: x? 2 argmin 1 ||y
2 x||2 + R(x)
x2R N
Data fidelity Regularity
P
Total Variation: R(x) = i ||(rx)i ||
37. 1
Example: Regularization
Inverse problem: measurements y = x0 + w
x0 y x?
argmin
Regularized inversion: x? 2 argmin 1 ||y
2 x||2 + R(x)
x2R N
Data fidelity Regularity
P
Total Variation: R(x) = i ||(rx)i ||
1
P ⇤
` sparsity: R(x) = i |xi |
Images are sparse
in wavelet bases. ⇤
Image f = x Coe↵. x = f
38. Overview
• Optimal Transport and Imaging
• Convex Analysis and Proximal Calculus
• Forward Backward
• Douglas Rachford and ADMM
• Generalized Forward-Backward
• Primal-Dual Schemes
40. Convex Optimization
Setting: G : H R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem: min G(x)
x H
Class of functions: x y
Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
41. Convex Optimization
Setting: G : H R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem: min G(x)
x H
Class of functions: x y
Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
Lower semi-continuous: lim inf G(x) G(x0 )
x x0
Proper: {x ⇥ H G(x) ⇤= + } = ⌅
⇤
42. Convex Optimization
Setting: G : H R ⇤ {+⇥}
H: Hilbert space. Here: H = RN .
Problem: min G(x)
x H
Class of functions: x y
Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
Lower semi-continuous: lim inf G(x) G(x0 )
x x0
Proper: {x ⇥ H G(x) ⇤= + } = ⌅
⇤
0 if x ⇥ C,
Indicator: C (x) =
+ otherwise.
(C closed and convex)
44. Sub-differential
Sub-di erential:
G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧}
G(x) = |x|
Smooth functions:
If F is C 1 , F (x) = { F (x)}
G(0) = [ 1, 1]
45. Sub-differential
Sub-di erential:
G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧}
G(x) = |x|
Smooth functions:
If F is C 1 , F (x) = { F (x)}
G(0) = [ 1, 1]
First-order conditions:
x argmin G(x) 0 G(x )
x H
46. Sub-differential
Sub-di erential:
G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧}
G(x) = |x|
Smooth functions:
If F is C 1 , F (x) = { F (x)}
G(0) = [ 1, 1]
First-order conditions:
x argmin G(x) 0 G(x )
x H U (x)
x
Monotone operator: U (x) = G(x)
(u, v) U (x) U (y), y x, v u 0
48. Prox and Subdifferential
1
Prox G (x) = argmin ||x z||2 + G(z)
z 2
Resolvant of G:
z = Prox G (x) 0 z x + ⇥G(z)
x (Id + ⇥G)(z)
49. Prox and Subdifferential
1
Prox G (x) = argmin ||x z||2 + G(z)
z 2
Resolvant of G:
z = Prox G (x) 0 z x + ⇥G(z)
x (Id + ⇥G)(z) z = (Id + ⇥G) 1
(x)
Inverse of a set-valued mapping:
where x U (y) y U 1
(x)
Prox G = (Id + ⇥G) 1
is a single-valued mapping
50. Prox and Subdifferential
1
Prox G (x) = argmin ||x z||2 + G(z)
z 2
Resolvant of G:
z = Prox G (x) 0 z x + ⇥G(z)
x (Id + ⇥G)(z) z = (Id + ⇥G) 1
(x)
Inverse of a set-valued mapping:
where x U (y) y U 1
(x)
Prox G = (Id + ⇥G) 1
is a single-valued mapping
Fix point: x argmin G(x)
x
0 G(x ) x (Id + ⇥G)(x )
x⇥ = (Id + ⇥G) 1
(x⇥ ) = Prox G (x⇥ )
53. Proximal Calculus
Separability: G(x) = G1 (x1 ) + . . . + Gn (xn )
ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
1
Quadratic functionals: G(x) = || x y||2
2
Prox G = (Id + ) 1
= (Id + ) 1
Composition by tight frame: A A = Id
ProxG A (x) =A ProxG A + Id A A
54. Proximal Calculus
Separability: G(x) = G1 (x1 ) + . . . + Gn (xn )
ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
1
Quadratic functionals: G(x) = || x y||2
2
Prox G = (Id + ) 1
= (Id + ) 1
Composition by tight frame: A A = Id
ProxG A (x) =A ProxG A + Id A A
x
Indicators: G(x) = C (x)
C
Prox G (x) = ProjC (x) ProjC (x)
= argmin ||x z||
z C
55. Prox of Sparse Regularizers
1
Prox G (x) = argmin ||x z||2 + G(z)
z 2
60. Legendre-Fenchel Duality
Legendre-Fenchel transform:
G (u) = sup u, x G(x) eu
x dom(G) G(x) S lop
G (u)
Example: quadratic functional
1 x
G(x) = Ax, x + x, b
2
1
G (u) = u b, A 1 (u b)
2
Moreau’s identity:
Prox G (x) = x ProxG/ (x/ )
G simple G simple
61. Indicator and Homogeneous Functionals
Positively 1-homogeneous functional: G( x) = | |G(x)
Example: norm G(x) = ||x||
Duality: G (x) = G (·) 1 (x) G (y) = min x, y
G(x) 1
62. Indicator and Homogeneous Functionals
Positively 1-homogeneous functional: G( x) = | |G(x)
Example: norm G(x) = ||x||
Duality: G (x) = G (·) 1 (x) G (y) = min x, y
G(x) 1
p
norms: G(x) = ||x||p 1 1
+ =1 1 p, q +
G (x) = ||x||q p q
63. Indicator and Homogeneous Functionals
Positively 1-homogeneous functional: G( x) = | |G(x)
Example: norm G(x) = ||x||
Duality: G (x) = G (·) 1 (x) G (y) = min x, y
G(x) 1
p
norms: G(x) = ||x||p 1 1
+ =1 1 p, q +
G (x) = ||x||q p q
Example: Proximal operator of norm
Prox ||·|| = Id Proj||·||1
Proj||·||1 (x)i = max 0, 1 xi
|xi |
for a well-chosen ⇥ = ⇥ (x, )
64. Prox of the J Functional
X ||m||2
˜
J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) =
˜ ˜ for ⇢ > 0
˜
i
⇢˜
65. Prox of the J Functional
X ||m||2
˜
J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) =
˜ ˜ for ⇢ > 0
˜
i
⇢˜
Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i
66. Prox of the J Functional
X ||m||2
˜
J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) =
˜ ˜ for ⇢ > 0
˜
i
⇢˜
Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i
j ⇤ = ◆C where C = (a, b) 2 R2 ⇥ R 2||a||2 + b 6 0
Prox j (˜) = x
x ˜ ProjC (˜/ )
x where x = (m, ⇢)
˜ ˜ ˜
67. Prox of the J Functional
X ||m||2
˜
J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) =
˜ ˜ for ⇢ > 0
˜
i
⇢˜
Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i
j ⇤ = ◆C where C = (a, b) 2 R2 ⇥ R 2||a||2 + b 6 0
Prox j (˜) = x
x ˜ ProjC (˜/ )
x where x = (m, ⇢)
˜ ˜ ˜
⇢
(m? , ⇢? ) if ⇢? > 0
Proposition: Prox (m, ⇢) =
˜ ˜
(0, 0) otherwise.
⇢? m
˜
?
where m = ? and ⇢? is the largest root of
⇢ +2
X 3 + (4 ⇢)X 2 + 4 (
˜ ⇢)X
˜ ||m||2
˜ 4 2
⇢=0
˜
68. Overview
• Optimal Transport and Imaging
• Convex Analysis and Proximal Calculus
• Forward Backward
• Douglas Rachford and ADMM
• Generalized Forward-Backward
• Primal-Dual Schemes
69. Gradient and Proximal Descents
Gradient descent: x( +1) = x( ) G(x( ) ) [explicit]
G is C 1 and G is L-Lipschitz
Theorem: If 0 < < 2/L, x( )
x a solution.
70. Gradient and Proximal Descents
Gradient descent: x( +1) = x( ) G(x( ) ) [explicit]
G is C 1 and G is L-Lipschitz
Theorem: If 0 < < 2/L, x( )
x a solution.
Sub-gradient descent: x( +1)
= x( )
v( ) , v( )
G(x( ) )
Theorem: If 1/⇥, x( )
x a solution.
Problem: slow.
71. Gradient and Proximal Descents
Gradient descent: x( +1) = x( ) G(x( ) ) [explicit]
G is C 1 and G is L-Lipschitz
Theorem: If 0 < < 2/L, x( )
x a solution.
Sub-gradient descent: x( +1)
= x( )
v( ) , v( )
G(x( ) )
Theorem: If 1/⇥, x( )
x a solution.
Problem: slow.
Proximal-point algorithm: x(⇥+1) = Prox G (x(⇥) ) [implicit]
Theorem: If c > 0, x( )
x a solution.
Prox G hard to compute. [Rockafellar, 70]
73. Proximal Splitting Methods
Solve min E(x)
x H
Problem: Prox E is not available.
Splitting: E(x) = F (x) + Gi (x)
i
Smooth Simple
74. Proximal Splitting Methods
Solve min E(x)
x H
Problem: Prox E is not available.
Splitting: E(x) = F (x) + Gi (x)
i
Smooth Simple
F (x)
Iterative algorithms using:
Prox Gi (x)
solves
Forward-Backward: F + G
Douglas-Rachford: Gi
Primal-Dual: Gi A
Generalized FB: F+ Gi
75. Smooth + Simple Splitting
Inverse problem: measurements y = Kf0 + w
f0 Kf0
K K : RN RP , P N
Model: f0 = x0 sparse in dictionary .
Sparse recovery: f = x where x solves
min F (x) + G(x)
x RN
Smooth Simple
1
Data fidelity: F (x) = ||y x||2 =K ⇥
2
Regularization: G(x) = ||x||1 = |xi |
i
77. Forward-Backward
Fix point equation:
x argmin F (x) + G(x) 0 F (x ) + G(x )
x
(x F (x )) x + ⇥G(x )
x⇥ = Prox G (x⇥ F (x⇥ ))
Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
78. Forward-Backward
Fix point equation:
x argmin F (x) + G(x) 0 F (x ) + G(x )
x
(x F (x )) x + ⇥G(x )
x⇥ = Prox G (x⇥ F (x⇥ ))
Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
Projected gradient descent: G= C
79. Forward-Backward
Fix point equation:
x argmin F (x) + G(x) 0 F (x ) + G(x )
x
(x F (x )) x + ⇥G(x )
x⇥ = Prox G (x⇥ F (x⇥ ))
Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
Projected gradient descent: G= C
Theorem: Let F be L-Lipschitz.
If < 2/L, x( )
x a solution of ( )
[Passty 79, Gabay, 83]
80. Example: L1 Regularization
1
min || x y||2 + ||x||1 min F (x) + G(x)
x 2 x
1
F (x) = || x y||2
2
F (x) = ( x y) L = || ||
G(x) = ||x||1
⇥
Prox G (x)i = max 0, 1 xi
|xi |
Forward-backward Iterative soft thresholding
81. Convergence Speed
min E(x) = F (x) + G(x)
x
F is L-Lipschitz.
G is simple.
Theorem: If L > 0, FB iterates x( )
satisfies
E(x( ) ) E(x ) C/
C degrades with L 0.
82. Multi-steps Accelerations
Beck-Teboule accelerated FB: t(0) = 1
✓ ◆
(`+1) (`) 1
x = Prox1/L y rF (y (`) )
L
1+ 1 + 4(t( ) )2
t( +1) =
2()
t 1 (
y ( +1)
=x( +1)
+ ( +1) (x +1)
x( ) )
t
(see also Nesterov method)
C
Theorem: If L > 0, E(x ( )
) E(x )
Complexity theory: optimal in a worse-case sense.
83. Overview
• Optimal Transport and Imaging
• Convex Analysis and Proximal Calculus
• Forward Backward
• Douglas Rachford and ADMM
• Generalized Forward-Backward
• Primal-Dual Schemes
84. Douglas Rachford Scheme
min G1 (x) + G2 (x) ( )
x
Simple Simple
Douglas-Rachford iterations:
z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) )
2 2
x(⇥+1) = Prox G2 (z (⇥+1) )
Reflexive prox: RProx G (x) = 2Prox G (x) x
85. Douglas Rachford Scheme
min G1 (x) + G2 (x) ( )
x
Simple Simple
Douglas-Rachford iterations:
z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) )
2 2
x(⇥+1) = Prox G2 (z (⇥+1) )
Reflexive prox: RProx G (x) = 2Prox G (x) x
Theorem: If 0 < < 2 and ⇥ > 0,
x( )
x a solution of ( )
[Lions, Mercier, 79]
86. DR Fix Point Equation
min G1 (x) + G2 (x) 0 (G1 + G2 )(x)
x
z, z x ⇥( G1 )(x) and x z ⇥( G2 )(x)
x = Prox G1 (z) and (2x z) x ⇥( G2 )(x)
87. DR Fix Point Equation
min G1 (x) + G2 (x) 0 (G1 + G2 )(x)
x
z, z x ⇥( G1 )(x) and x z ⇥( G2 )(x)
x = Prox G1 (z) and (2x z) x ⇥( G2 )(x)
x = Prox G2 (2x z) = Prox G2 RProx G1 (z)
z = 2Prox G2 RProx G1 (y) (2x z)
z = 2Prox G2 RProx G1 (z) RProx G1 (z)
z = RProx G2 RProx G1 (z)
z= 1 z+ RProx G2 RProx G1 (z)
2 2
88. Example: Optimal Transport on Centered Grid
s
min J(x) + ◆C (x)
x2RGc ⇥2
C = {x = (m, ⇢) Ax = b} I0 I1
b = (0, ⇢0 , ⇢1 )
t
A(x) = (div(x), ⇢I0 , ⇢I1 )
Centered grid Gc
89. Example: Optimal Transport on Centered Grid
s
min J(x) + ◆C (x)
x2RGc ⇥2
C = {x = (m, ⇢) Ax = b} I0 I1
b = (0, ⇢0 , ⇢1 )
t
A(x) = (div(x), ⇢I0 , ⇢I1 )
Centered grid Gc
Prox J : cubic root (closed form).
90. Example: Optimal Transport on Centered Grid
s
min J(x) + ◆C (x)
x2RGc ⇥2
C = {x = (m, ⇢) Ax = b} I0 I1
b = (0, ⇢0 , ⇢1 )
t
A(x) = (div(x), ⇢I0 , ⇢I1 )
Centered grid Gc
Prox J : cubic root (closed form).
Prox◆C = ProjC = (Id A⇤ 1
A) + A⇤ 1
y
1
= (AA⇤ ) 1
: solving a Poisson equation with b.c.
91. Example: Optimal Transport on Centered Grid
s
min J(x) + ◆C (x)
x2RGc ⇥2
C = {x = (m, ⇢) Ax = b} I0 I1
b = (0, ⇢0 , ⇢1 )
t
A(x) = (div(x), ⇢I0 , ⇢I1 )
Centered grid Gc
Prox J : cubic root (closed form).
Prox◆C = ProjC = (Id A⇤ 1
A) + A⇤ 1
y
1
= (AA⇤ ) 1
: solving a Poisson equation with b.c.
Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000]
92. Example: Optimal Transport on Centered Grid
s
min J(x) + ◆C (x)
x2RGc ⇥2
C = {x = (m, ⇢) Ax = b} I0 I1
b = (0, ⇢0 , ⇢1 )
t
A(x) = (div(x), ⇢I0 , ⇢I1 )
Centered grid Gc
Prox J : cubic root (closed form).
Prox◆C = ProjC = (Id A⇤ 1
A) + A⇤ 1
y
1
= (AA⇤ ) 1
: solving a Poisson equation with b.c.
Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000]
! Advantage: relaxation parameter ↵ 2]0, 1[.
93. Example: Constrained L1
min ||x||1 min G1 (x) + G2 (x)
x=y x
G1 (x) = iC (x), C = {x x = y}
Prox G1 (x) = ProjC (x) = x +
⇥
( ⇥
) 1
(y x)
G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi
|xi | i
e⇥cient if easy to invert.
94. Example: Constrained L1
min ||x||1 min G1 (x) + G2 (x)
x=y x
G1 (x) = iC (x), C = {x x = y}
Prox G1 (x) = ProjC (x) = x +
⇥
( ⇥
) 1
(y x)
G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi
|xi | i
e⇥cient if easy to invert. log10 (||x( ) ||1 ||x ||1 )
1
Example: compressed sensing −1
0
R100 400
Gaussian matrix −2
−3 = 0.01
y = x0 ||x0 ||0 = 17 −4 =1
−5
= 10
50 100 150 200 250
95. Auxiliary Variables with DR
min G1 (x) + G2 A(x) Linear map A : E H.
x
min G(z) + C (z) G1 , G2 simple.
z⇥H E
G(x, y) = G1 (x) + G2 (y)
C = {(x, y) ⇥ H E Ax = y}
96. Auxiliary Variables with DR
min G1 (x) + G2 A(x) Linear map A : E H.
x
min G(z) + C (z) G1 , G2 simple.
z⇥H E
G(x, y) = G1 (x) + G2 (y)
C = {(x, y) ⇥ H E Ax = y}
Prox G (x, y) = (Prox G1 (x), Prox G2 (y))
Prox C (x, y) = (x + A y , y
˜ y ) = (˜, A˜)
˜ x x
y = (Id + AA )
˜ 1
(Ax y)
where
x = (Id + A A)
˜ 1
(A y + x)
e cient if Id + AA or Id + A A easy to invert.
97. Example: TV Regularization
1 ||u||1 = ||ui ||
min ||Kf y||2 + ||⇥f ||1
f 2 i
min G1 (f ) + G2 (f )
x
G1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui
||ui ||
1
G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1
K
2 G2
C = (f, u) ⇥ RN RN 2
u = ⇤f
˜ ˜
Prox C (f, u) = (f , f )
98. Example: TV Regularization
1 ||u||1 = ||ui ||
min ||Kf y||2 + ||⇥f ||1
f 2 i
min G1 (f ) + G2 (f )
x
G1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui
||ui ||
1
G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1
K
2 G2
C = (f, u) ⇥ RN RN 2
u = ⇤f
˜ ˜
Prox C (f, u) = (f , f )
Compute the solution of: (Id + ˜
)f = div(u) + f
O(N log(N )) operations using FFT.
100. Alternating Direction Method of Multipliers
min F (x) + G A(x) (?) () min F (x) + G(y)
x x,y=Ax
A : RN ! RP injective.
101. Alternating Direction Method of Multipliers
min F (x) + G A(x) (?) () min F (x) + G(y)
x x,y=Ax
A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi
x,y u
102. Alternating Direction Method of Multipliers
min F (x) + G A(x) (?) () min F (x) + G(y)
x x,y=Ax
A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi
x,y u
Augmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2
x,y u 2
103. Alternating Direction Method of Multipliers
min F (x) + G A(x) (?) () min F (x) + G(y)
x x,y=Ax
A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi
x,y u
Augmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2
x,y u 2
(`+1)
x = argminx L (x, y (`) , u(`) )
ADMM
y (`+1) = argminy L (x(`+1) , y, u(`) )
u(`+1) = u(`) + (y (`+1) Ax(`+1) )
104. Alternating Direction Method of Multipliers
min F (x) + G A(x) (?) () min F (x) + G(y)
x x,y=Ax
A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi
x,y u
Augmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2
x,y u 2
(`+1)
x = argminx L (x, y (`) , u(`) )
ADMM
y (`+1) = argminy L (x(`+1) , y, u(`) )
u(`+1) = u(`) + (y (`+1) Ax(`+1) )
Theorem: If > 0, x( )
x a solution of ( )
[Gabay, Mercier, Glowinski, Marrocco, 76]
105. ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
A 1
Prox F = argmin ||Ax z||2 + F (x)
x 2
106. ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
A 1
Prox F = argmin ||Ax z||2 + F (x)
x 2
Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ )
107. ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
A 1
Prox F = argmin ||Ax z||2 + F (x)
x 2
Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ )
x(`+1) = ProxA (y (`)
F/ u(`) )
ADMM
y (`+1) = ProxG/ (Ax(`+1) + u(`) )
u(`+1) = u(`) + (y (`+1) Ax(`+1) )
108. ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
A 1
Prox F = argmin ||Ax z||2 + F (x)
x 2
Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ )
x(`+1) = ProxA (y (`)
F/ u(`) )
ADMM
y (`+1) = ProxG/ (Ax(`+1) + u(`) )
u(`+1) = u(`) + (y (`+1) Ax(`+1) )
! If G A is simple: use DR.
! If F ⇤ A⇤ is simple: use ADMM.
109. ADMM vs. DR
Fenchel-Rockafellar duality:
min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u)
x u
Important: no bijection between u and x.
110. ADMM vs. DR
Fenchel-Rockafellar duality:
min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u)
x u
Important: no bijection between u and x.
Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM.
[Eckstein, Bertsekas, 92]
111. ADMM vs. DR
Fenchel-Rockafellar duality:
min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u)
x u
Important: no bijection between u and x.
Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM.
[Eckstein, Bertsekas, 92]
DR iterations (when ↵ = 1):
(`+1) 1 (`) 1
z = z + RProx F⇤ A⇤ RProx G⇤ (z (`) )
2 2
112. ADMM vs. DR
Fenchel-Rockafellar duality:
min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u)
x u
Important: no bijection between u and x.
Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM.
[Eckstein, Bertsekas, 92]
DR iterations (when ↵ = 1):
(`+1) 1 (`) 1
z = z + RProx F ⇤ A⇤ RProx G⇤ (z (`) )
2 2
The iterates of ADMM are recovered using:
(`) 1 (`)
y = (z u(`) )
x(`+1) = ProxA (y (`) u(`) )
F/
u(`) = Prox G⇤ (z (`) )
113. More than 2 Functionals
min G1 (x) + . . . + Gk (x) each Fi is simple
x
min G(x1 , . . . , xk ) + C (x1 , . . . , xk )
x
G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )
C = (x1 , . . . , xk ) Hk x1 = . . . = xk
114. More than 2 Functionals
min G1 (x) + . . . + Gk (x) each Fi is simple
x
min G(x1 , . . . , xk ) + C (x1 , . . . , xk )
x
G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )
C = (x1 , . . . , xk ) Hk x1 = . . . = xk
G and C are simple:
Prox G (x1 , . . . , xk ) = (Prox Gi (xi ))i
1
Prox ⇥C (x1 , . . . , xk ) = (˜, . . . , x)
x ˜ where x =
˜ xi
k i
115. Overview
• Optimal Transport and Imaging
• Convex Analysis and Proximal Calculus
• Forward Backward
• Douglas Rachford and ADMM
• Generalized Forward-Backward
• Primal-Dual Schemes
116. GFB Splitting
n
min F (x) + Gi (x) ( )
x RN
i=1
i = 1, . . . , n, Smooth Simple
(⇥+1) (⇥) (⇥)
zi = zi + Proxn G (2x
(⇥)
zi F (x(⇥) )) x(⇥)
n
1 ( +1)
x( +1)
= zi
n i=1
[Raguet, Fadili, Peyr´ 2012]
e
117. GFB Splitting
n
min F (x) + Gi (x) ( )
x RN
i=1
i = 1, . . . , n, Smooth Simple
(⇥+1) (⇥) (⇥)
zi = zi + Proxn G (2x (⇥)
zi F (x(⇥) )) x(⇥)
n
1 ( +1)
x( +1)
= zi
n i=1
Theorem: Let F be L-Lipschitz.
If < 2/L, x( )
x a solution of ( )
[Raguet, Fadili, Peyr´ 2012]
e
118. GFB Splitting
n
min F (x) + Gi (x) ( )
x RN
i=1
i = 1, . . . , n, Smooth Simple
(⇥+1) (⇥) (⇥)
zi = zi + Proxn G (2x (⇥)
zi F (x(⇥) )) x(⇥)
n
1 ( +1)
x( +1)
= zi
n i=1
Theorem: Let F be L-Lipschitz.
If < 2/L, x( )
x a solution of ( )
[Raguet, Fadili, Peyr´ 2012]
e
n=1 Forward-backward.
F =0 Douglas-Rachford.
119. GFB Fix Point
x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x )
x RN
yi Gi (x ), F (x ) + i yi =0
120. GFB Fix Point
x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x )
x RN
yi Gi (x ), F (x ) + i yi =0
1
(zi )n ,
i=1 i, x zi F (x ) ⇥Gi (x )
n
x = 1
n i zi (use zi = x F (x ) N yi )
121. GFB Fix Point
x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x )
x RN
yi Gi (x ), F (x ) + i yi =0
1
(zi )n ,
i=1 i, x zi F (x ) ⇥Gi (x )
n
x = 1
n i zi (use zi = x F (x ) N yi )
(2x zi F (x )) x n ⇥Gi (x )
x⇥ = Proxn Gi (2x⇥ zi F (x⇥ ))
zi = zi + Proxn G (2x⇥ zi F (x⇥ )) x⇥
122. GFB Fix Point
x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x )
x RN
yi Gi (x ), F (x ) + i yi =0
1
(zi )n ,
i=1 i, x zi F (x ) ⇥Gi (x )
n
x = 1
n i zi (use zi = x F (x ) N yi )
(2x zi F (x )) x n ⇥Gi (x )
x⇥ = Proxn Gi (2x⇥ zi F (x⇥ ))
zi = zi + Proxn G (2x⇥ zi F (x⇥ )) x⇥
+ Fix point equation on (x , z1 , . . . , zn ).
123. Block Regularization
1 2
block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2
m
b B m b
iments Towards More Complex Penalization
(2) Bk
2
+ ` 1 `2
4
k=1 x 1,2
b B1 i b xi
⇥ x⇥⇥1 = i ⇥xi ⇥ b B i b xi2 +
i b xi
N: 256
b B2
b B
Image f = x Coe cients x.
124. Block Regularization
1 2
block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2
m
b B m b
iments Towards More Complex Penalization
Non-overlapping decomposition: B = B ... B
Towards More Complex Penalization
Towards More Complex Penalization
n
1 n
2 G(x) =4 x iBk
(2)
+ ` ` k=1 G 1,2
(x) Gi (x) = ||x[b] ||,
1 2 i=1 b Bi
b b 1b1 B1 i b xiixb xi
22
BB
⇥ x⇥x⇥x⇥⇥1 =i ⇥x⇥x⇥xi ⇥
⇥= ++ +
i b i
⇥ ⇥1 ⇥1 = i i ⇥i i ⇥ bb B B i
Bb xii2bi2xi2
bbx
i
N: 256
b b 2b2 B2 i
BB xi2 b2xi
b b xi
i
b B
Image f = x Coe cients x. Blocks B1 B1 B2
125. Block Regularization
1 2
block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2
m
b B m b
iments Towards More Complex Penalization
Non-overlapping decomposition: B = B ... B
Towards More Complex Penalization
Towards More Complex Penalization
n
1 n
2 G(x) =4 x iBk
(2)
+ ` ` k=1 G 1,2
(x) Gi (x) = ||x[b] ||,
1 2 i=1 b Bi
Each Gi is simple: b b 1b1 B1 i b xiixb xi
BB
22
⇥ x⇥x⇥x⇥⇥1 =i ⇥xG ⇥xi ⇥ m = b B B i b xii2bi2xi2
⇥ ⇥1 = i ⇥i i x +
i b i
⇤ m ⇥ b ⇥ Bi , ⇥ ⇥1Prox i ⇥xi ⇥(x) b max i0, 1
= Bb bx ++m
N: 256 ||x[b]b||B xi2 b2xi
2 2 B2
b B b i b b xi
i
b B
Image f = x Coe cients x. Blocks B1 B1 B2
126. Deconv. + Inpaint. 2min+CP Y ⇥ P K x CP Y + P 1 K2
Deconv. x 2Inpaint. min 2 ⇥ ` `
x x
k=1 x+1,2` k=1
log10(E−E
2 1 `2
Numerical Illustration
log10(E−
1
Numerical Experiments Experiments
1
Numerical 1
TI (2)`2 4
0
||y x 1 ⇥x||368s PRx 2 minix(x)Y ⇥ K x= + `wavelets x
Bk 2
0
: 283s; tPR: 298s; tCP:: 283s; t : 298s; t (2)
Deconvolution min 2 Y ⇥ K
tmin
−1 EFB
x 102
Deconvolution +GCP: 1` 4
−1
tEFB 2 +
10 40 20
368s
30 1 2 2 40k=1
`
x 1,2 1 k=1
20 30
3 iteration 3
# i
EFB iteration # EFB
log10(E−Emin)
log10(E−Emin)
PR PR
2 = convolution
2 = inpainting+convolution l1/l2
l1/l2
: 1.30e−03; CPλ2 : 1.30e−03; CP 2
λ
tPR: 173s; tCP 190s noise: 0.025; convol.::it. #50; SNR: 22.49dB #50; SNR: 22.49dB
tEFB: 161s; tPR: 173s; tCP N: 256
tEFB: 161s; noise: 0.025; :convol.: 2 190s
1
Numerical Experiments 2 1
EFB
it. N: 256
EFB
(4) Bk
Y ⇥P K +
0 0
log10(E−Emin)
3 3
1
PR
2 PR 16
onv. + Inpaint. minx
2
CP
2 30
2
x CP
`140`2 k=1 x 1,2
10 20 10 40 20 30
1 iteration #
1 iteration #
0 0 λ4 : 1.00e−03; λ4 : 1.00e−03;
l1/l2 l1/l2
tEFB: 283s; tPR: 298s; tCP: 368s
−1 noise: 0.025; degrad.: 0.4; 0.025; degrad.: 0.4; convol.: 2
noise: convol.: 2
−1 it. #50; SNR: 21.80dB #50; SNR: 21.80dB
it.
10 20
iteration #
30
EFB
40 10 20
iteration #
30 40 x0
3
PR
min
2
CP
λ2 : 1.30e−03; λ2 : 1.30e−03;
l1/l2 l1/l2
1 noise: 0.025; convol.: 2 noise: 0.025; it. #50; SNR: 22.49dB
convol.: 2 it. #50; SNR: 22.49dB
10
0
log10
10 20
(E(x( ) ) #
iteration
30
E(x ))
y = x0 + w 40
x
4
127. Overview
• Optimal Transport and Imaging
• Convex Analysis and Proximal Calculus
• Forward Backward
• Douglas Rachford and ADMM
• Generalized Forward-Backward
• Primal-Dual Schemes
129. Primal-dual Formulation
Fenchel-Rockafellar duality: A:H⇥ L linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u)
2
x2H x u2L
Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 ))
(min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui
2
u x
= max G⇤ (u)
2 G⇤ (
1 A⇤ u)
u
130. Primal-dual Formulation
Fenchel-Rockafellar duality: A:H⇥ L linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u)
2
x2H x u2L
Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 ))
(min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui
2
u x
= max G⇤ (u)
2 G⇤ (
1 A⇤ u)
u
Recovering x? from some u? :
x? = argmin G1 (x? ) + hx? , A⇤ u? i
x
131. Primal-dual Formulation
Fenchel-Rockafellar duality: A:H⇥ L linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u)
2
x2H x u2L
Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 ))
(min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui
2
u x
= max G⇤ (u)
2 G⇤ (
1 A⇤ u)
u
Recovering x? from some u? :
x? = argmin G1 (x? ) + hx? , A⇤ u? i
x
() A⇤ u? 2 @G1 (x? )
() x? 2 (@G1 ) 1
( A⇤ u? ) = @G⇤ ( A⇤ s? )
1
132. Forward-Backward on the Dual
If G1 is strongly convex: r2 G1 > cId
c
G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2
2
133. Forward-Backward on the Dual
If G1 is strongly convex: r2 G1 > cId
c
G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2
2
x? uniquely defined.
x? = rG? ( A⇤ u? )
1
G? is of class C 1 .
1
134. Forward-Backward on the Dual
If G1 is strongly convex: r2 G1 > cId
c
G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2
2
x? uniquely defined.
x? = rG? ( A⇤ u? )
1
G? is of class C 1 .
1
FB on the dual: min G1 (x) + G2 A(x)
x2H
= min G? ( A⇤ u) + G? (u)
1 2
u2L
Smooth Simple
⇣ ⌘
u(`+1) = Prox⌧ G? u(`) + ⌧ A⇤ rG? ( A⇤ u(`) )
2 1
135. Example: TV Denoising
1
min ||f y||2 + ||⇥f ||1 min ||y + div(u)||2
f RN 2 ||u||
||u||1 = ||ui || ||u|| = max ||ui ||
i
i
Dual solution u Primal solution f = y + div(u )
[Chambolle 2004]
136. Example: TV Denoising
1
min ||f y||2 + ||⇥f ||1 min ||y + div(u)||2
f RN 2 ||u||
||u||1 = ||ui || ||u|| = max ||ui ||
i
i
Dual solution u Primal solution f = y + div(u )
FB (aka projected gradient descent): [Chambolle 2004]
u( +1)
= Proj||·|| u( ) + (y + div(u( ) ))
ui
v = Proj||·|| (u) vi =
max(||ui ||/ , 1)
2 1
Convergence if < =
||div ⇥|| 4
137. Primal-Dual Algorithm
min G1 (x) + G2 A(x)
x H
() min max G1 (x) G⇤ (z) + hA(x), zi
2
x z
138. Primal-Dual Algorithm
min G1 (x) + G2 A(x)
x H
() min max G1 (x) G⇤ (z) + hA(x), zi
2
x z
z (`+1) = Prox G⇤
2
(z (`) + A(˜(`) )
x
x(⇥+1) = Prox G1 (x(⇥) A (z (⇥) ))
˜
x( +1)
= x( +1)
+ (x( +1)
x( ) )
= 0: Arrow-Hurwicz algorithm.
= 1: convergence speed on duality gap.
139. Primal-Dual Algorithm
min G1 (x) + G2 A(x)
x H
() min max G1 (x) G⇤ (z) + hA(x), zi
2
x z
z (`+1) = Prox G⇤
2
(z (`) + A(˜(`) )
x
x(⇥+1) = Prox G1 (x(⇥) A (z (⇥) ))
˜
x( +1)
= x( +1)
+ (x( +1)
x( ) )
= 0: Arrow-Hurwicz algorithm.
= 1: convergence speed on duality gap.
Theorem: [Chambolle-Pock 2011]
If 0 1 and ⇥⇤ ||A||2 < 1 then
x( )
x minimizer of G1 + G2 A.
140. Example: Optimal Transport
Staggered grid formulation :
min
1 2
J(I(x)) + ◆C (x)
x2RGst ⇥RGst
1 2
Gst Gst
1 2
I = (I , I ) : R ⇥R ! RG c
s
s
I
t t
Staggered grid Centered grid Gc
1 2
Gst Gst
141. Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex.
Highly structured (separability, p
norms, . . . ).
142. Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Towards More Complex Penalization
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex. b B1 i b xi
2
⇥ x⇥⇥1 = i ⇥xi ⇥ b B
2
i p xi +
Highly structured (separability, b
norms, . . . ).
b B2 i b xi2
Proximal splitting:
Unravel the structure of problems.
Parallelizable.
Decomposition G = k Gk
143. Conclusion
Inverse problems in imaging:
Large scale, N 106 .
Towards More Complex Penalization
Non-smooth (sparsity, TV, . . . )
(Sometimes) convex. b B1 i b xi
2
⇥ x⇥⇥1 = i ⇥xi ⇥ b B
2
i p xi +
Highly structured (separability, b
norms, . . . ).
b B2 i b xi2
Proximal splitting:
Unravel the structure of problems.
Parallelizable.
Open problems:
Decomposition G = k Gk
Less structured problems without smoothness.
Non-convex optimization.