A probabilistic model for recursive factorized image features ppt
Johan Suykens: "Models from Data: a Unifying Picture"
1. Models from Data: a Unifying Picture
Johan Suykens
KU Leuven, ESAT-SCD/SISTA
Kasteelpark Arenberg 10
B-3001 Leuven (Heverlee), Belgium
Email: johan.suykens@esat.kuleuven.be
http://www.esat.kuleuven.be/scd/
Grand Challenges of Computational Intelligence
Nicosia Cyprus, Sept. 14, 2012
Models from data: a unifying picture - Johan Suykens
3. High-quality predictive models are crucial
biomedical
bio-informatics
Models from data: a unifying picture - Johan Suykens 1
4. High-quality predictive models are crucial
biomedical process industry
energy
bio-informatics
Models from data: a unifying picture - Johan Suykens 1
5. High-quality predictive models are crucial
biomedical process industry
energy
brain-computer interfaces traffic networks
bio-informatics
Models from data: a unifying picture - Johan Suykens 1
6. Classical Neural Networks
x1 w1
x2 w2 y
w h(·)
x3 w 3
xn n
b h(·)
1
Multilayer Perceptron (MLP) properties:
- Universal approximation
- Learning from input-output patterns: off-line & on-line
- Parallel network architecture, multiple inputs and outputs
+ Flexible and widely applicable:
Feedforward & recurrent networks, supervised & unsupervised learning
- Many local minima, trial and error for number of neurons
Models from data: a unifying picture - Johan Suykens 2
7. Support Vector Machines
cost function cost function
MLP SVM
weights weights
• Nonlinear classification and function estimation by convex optimization
• Learning and generalization in high dimensional input spaces
• Use of kernels:
- linear, polynomial, RBF, MLP, splines, kernels from graphical models,...
- application-specific kernels: e.g. bioinformatics, textmining
[Vapnik, 1995; Sch¨lkopf & Smola, 2002; Shawe-Taylor & Cristianini, 2004]
o
Models from data: a unifying picture - Johan Suykens 3
8. Kernel-based models: different views
SVM Some early history on RKHS:
LS−SVM
1910-1920: Moore
1940: Aronszajn
Kriging RKHS
1951: Krige
1970: Parzen
Gaussian Processes 1971: Kimeldorf & Wahba
Complementary insights from different perspectives:
kernels are used in different methodologies
- Support vector machines (SVM): optimization approach (primal/dual)
- Reproducing kernel Hilbert spaces (RKHS): variational problem, functional analysis
- Gaussian processes (GP): probabilistic/Bayesian approach
Models from data: a unifying picture - Johan Suykens 4
9. SVMs: living in two worlds ...
Primal space
Parametric
y = sign[wT ϕ(x) + b]
ˆ
ϕ(x) ϕ1 (x)
x w1
ˆ
y
xx x
x wnh
x x x ϕnh (x)
x
o x K(xi , xj ) = ϕ(xi )T ϕ(xj ) (“Kernel trick”)
o x
o oo
o
Dual space
o Input space
o Nonparametric
P#sv
y = sign[
ˆ i=1 αi yiK(x, xi ) + b]
K(x, x1 )
Feature space α1
ˆ
y x
α#sv
K(x, x#sv )
Models from data: a unifying picture - Johan Suykens 5
10. SVMs: living in two worlds ...
Primal space
Parametric
Parametric
y = sign[wT ϕ(x) + b]
ˆ
ϕ(x) ϕ1 (x)
x w1
ˆ
y
xx x
x wnh
x x x ϕnh (x)
x
o x K(xi , xj ) = ϕ(xi )T ϕ(xj ) (“Kernel trick”)
o x
o oo
o
Dual space
o
Non−parametric
o Input space Nonparametric
P#sv
y = sign[
ˆ i=1 αi yiK(x, xi ) + b]
K(x, x1 )
Feature space α1
ˆ
y x
α#sv
K(x, x#sv )
Models from data: a unifying picture - Johan Suykens 5
11. Fitting models to data: alternative views
- Consider model y = f (x; w), given input/output data {(xi, yi)}N :
ˆ i=1
N
min wT w + γ (yi − f (xi; w))2
w
i=1
- Rewrite the problem as
minw,e wT w + γ N (yi − f (xi; w))2
i=1
subject to ei = yi − f (xi; w), i = 1, ..., N
- Construct Lagrangian and take condition for optimality
- Express the solution and the model in terms of the Lagrange multipliers
Models from data: a unifying picture - Johan Suykens 6
12. Fitting models to data: alternative views
- Consider model y = f (x; w), given input/output data {(xi, yi)}N :
ˆ i=1
N
min wT w + γ (yi − f (xi; w))2
w
i=1
- Rewrite the problem as
N
min wT w + γ i=1 e2
i
w,e
subject to ei = yi − f (xi; w), i = 1, ..., N
- Construct Lagrangian and take conditions for optimality
- Express the solution and the model in terms of the Lagrange multipliers
Models from data: a unifying picture - Johan Suykens 6
13. Linear model: solving in primal or dual?
inputs x ∈ Rd, output y ∈ R
training set {(xi, yi)}N
i=1
(P ) : y = wT x + b,
ˆ w ∈ Rd
ր
Model
ց
T
(D) : y =
ˆ i αi xi x + b, α ∈ RN
Models from data: a unifying picture - Johan Suykens 7
14. Linear model: solving in primal or dual?
inputs x ∈ Rd, output y ∈ R
training set {(xi, yi)}N
i=1
(P ) : y = wT x + b,
ˆ w ∈ Rd
ր
Model
ց
(D) : y =
ˆ i αi xT x + b,
i α ∈ RN
Models from data: a unifying picture - Johan Suykens 7
15. Linear model: solving in primal or dual?
few inputs, many data points: (e.g. 20 × 1.000.000)
primal : w ∈ R20
dual: α ∈ R1.000.000 (kernel matrix: 1.000.000 × 1.000.000)
Models from data: a unifying picture - Johan Suykens 7
16. Linear model: solving in primal or dual?
many inputs, few data points: (e.g. 10.000 × 50)
primal: w ∈ R10.000
dual : α ∈ R50 (kernel matrix: 50 × 50)
Models from data: a unifying picture - Johan Suykens 7
17. Least Squares Support Vector Machines: “core models”
• Regression
min wT w + γ e2 s.t. yi = wT ϕ(xi) + b + ei, ∀i
i
w,b,e
i
• Classification
min wT w + γ e2 s.t. yi(wT ϕ(xi) + b) = 1 − ei, ∀i
i
w,b,e
i
• Kernel pca (V = I), Kernel spectral clustering (V = D −1)
min −wT w + γ vie2 s.t. ei = wT ϕ(xi) + b, ∀i
i
w,b,e
i
• Kernel canonical correlation analysis/partial least squares
T T 2 ei = wT ϕ1(xi) + b
min w w+v v+ν (ei − ri) s.t.
w,v,b,d,e,r ri = v T ϕ2(yi ) + d
i
[Suykens & Vandewalle, 1999; Suykens et al., 2002; Alzate & Suykens, 2010]
Models from data: a unifying picture - Johan Suykens 8
18. Core models
regularization terms
Core model +
additional constraints
Models from data: a unifying picture - Johan Suykens 9
19. Core models
regularization terms
Core model +
additional constraints
optimal model representation
model estimate
Models from data: a unifying picture - Johan Suykens 9
20. Core models
parametric model
support vector machine
least−squares support vector machine
Parzen kernel model
regularization terms
Core model +
additional constraints
optimal model representation
model estimate
Models from data: a unifying picture - Johan Suykens 9
22. Kernel spectral clustering
• Underlying model: e∗ = wT ϕ(x∗)
ˆ
with q∗ = sign[ˆ∗] the estimated cluster indicator at any x∗ ∈ Rd.
ˆ e
• Primal problem: training on given data {xi }N
i=1
N
1 1
min − wT w + γ vi e2
i
w,e 2 2 i=1
subject to ei = wT ϕ(xi), i = 1, ..., N
with weights vi (related to inverse degree matrix: V = D −1).
Dual problem:
Ωα = λDα
with Ωij = K(xi, xj ) = ϕ(xi)T ϕ(xj ).
• Kernel spectral clustering [Alzate & Suykens, IEEE-PAMI, 2010], related to
spectral clustering [Fiedler, 1973; Shi & Malik, 2000; Ng et al. 2002; Chung, 1997]
Models from data: a unifying picture - Johan Suykens 10
23. Primal and dual model representations
bias term b for centering
k clusters, k − 1 sets of constraints (index l = 1, ..., k − 1)
(l) (l) T
(P ) : sign[ˆ∗ ]
e = sign[w ϕ(x∗) + bl]
ր
M
ց
(l) (l)
(D) : sign[ˆ∗ ] = sign[
e j αj K(x∗ , xj ) + bl ]
Advantages:
- out-of-sample extensions
- model selection
- solving large scale problems
Models from data: a unifying picture - Johan Suykens 11
31. Highly sparse kernel models: image segmentation
*
*
*
*
* *
* 0.5
0
* −0.5
**
(3)
−1
*
ei
* −1.5
−2
−2.5
−3
2
1 0.5
0 0
−1 −0.5
−2 −1
(2) −3 −1.5 (1)
ei ei
only 3k = 12 support vectors [Alzate & Suykens, Neurocomputing, 2011]
Models from data: a unifying picture - Johan Suykens 16
32. Kernel spectral clustering: adding prior knowledge
• Pair of points x†, x‡: c = 1 must-link, c = −1 cannot-link
• Primal problem [Alzate & Suykens, IJCNN 2009]
k−1 k−1
1 (l) T
(l) 1 (l)T
min − w w + γle D −1e(l)
w(l) ,e(l) ,bl 2 2
l=1 l=1
(1) (1)
subject to e = ΦN ×nh w + b11N
.
.
e(k−1) = ΦN ×nh w(k−1) + bk−11N
T T
w(1) ϕ(x†) = cw(1) ϕ(x‡)
.
.
T T
w(k−1) ϕ(x†) = cw(k−1) ϕ(x‡)
• Dual problem: yields rank-one downdate of the kernel matrix
Models from data: a unifying picture - Johan Suykens 17
33. Kernel spectral clustering: example
original image without constraints
Models from data: a unifying picture - Johan Suykens 18
34. Kernel spectral clustering: example
original image with constraints
Models from data: a unifying picture - Johan Suykens 19
35. Hierarchical kernel spectral clustering
Hierarchical kernel spectral clustering:
- looking at different scales
- use of model selection and validation data
[Alzate & Suykens, Neural Networks, 2012]
Models from data: a unifying picture - Johan Suykens 20
36. Power grid: kernel spectral clustering of time-series
1 1 1
0.9 0.9 0.9
0.8 0.8 0.8
normalized load
normalized load
normalized load
0.7 0.7 0.7
0.6 0.6 0.6
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
5 10 15 20 5 10 15 20 5 10 15 20
hour hour hour
Electricity load: 245 substations in Belgian grid (1/2 train, 1/2 validation)
xi ∈ R43.824: spectral clustering on high dimensional data (5 years)
3 of 7 detected clusters:
- 1: Residential profile: morning and evening peaks
- 2: Business profile: peaked around noon
- 3: Industrial profile: increasing morning, oscillating afternoon and evening
[Alzate, Espinoza, De Moor, Suykens, 2009]
Models from data: a unifying picture - Johan Suykens 21
37. Dimensionality reduction and data visualization
• Traditionally:
commonly used techniques are e.g. principal component analysis (PCA),
multi-dimensional scaling (MDS), self-organizing maps (SOM)
• More recently:
isomap, locally linear embedding (LLE), Hessian locally linear embedding,
diffusion maps, Laplacian eigenmaps
(“kernel eigenmap methods and manifold learning”)
[Roweis & Saul, 2000; Coifman et al., 2005; Belkin et al., 2006]
• Kernel maps with reference point [Suykens, IEEE-TNN 2008]:
data visualization and dimensionality reduction by solving linear system
Models from data: a unifying picture - Johan Suykens 22
38. Kernel maps with reference point: formulation
• Kernel maps with reference point [Suykens, IEEE-TNN 2008]:
- LS-SVM core part: realize dimensionality reduction x → z
- Regularization term: (z − PD z)T (z − PD z) = N zi − N sij Dzj 2
P P
i=1 j=1 2
2 2
with D diagonal matrix and sij = exp(− xi − xj 2/σ )
- reference point q (e.g. first point; sacrificed in the visualization)
• Example: d = 2
N
1 ν η X 2
min T T T
(z − PD z) (z − PD z)+ (w1 w1 + w2 w2 ) + (ei,1 + e2 )
i,2
z,w1 ,w2 ,b1 ,b2 ,ei,1 ,ei,2 2 2 2 i=1
such that cT z = q1 + e1,1
1,1
T
c1,2z = q2 + e1,2
cT z = w1 ϕ1 (xi ) + b1 + ei,1 , ∀i = 2, ..., N
i,1
T
cT z = w2 ϕ2 (xi ) + b2 + ei,2 , ∀i = 2, ..., N
i,2
T
Coordinates in low dimensional space: z = [z1; z2; ...; zN ] ∈ RdN
Models from data: a unifying picture - Johan Suykens 23
39. Kernel maps with reference point: formulation
• Kernel maps with reference point [Suykens, IEEE-TNN 2008]:
- LS-SVM core part: realize dimensionality reduction x → z
T PN PN 2
- Regularization term: (z − PD z) (z − PD z) = i=1 zi − j=1 sij Dzj 2
with D diagonal matrix and sij = exp(− xi − xj 2/σ 2)2
- reference point q (e.g. first point; sacrificed in the visualization)
• Example: d = 2
N
1 ν η X 2
min T T T
(z − PD z) (z − PD z)+ (w1 w1 + w2 w2 ) + (ei,1 + e2 )
i,2
z,w1 ,w2 ,b1 ,b2 ,ei,1 ,ei,2 2 2 2 i=1
such that cT z = q1 + e1,1
1,1
T
c1,2z = q2 + e1,2
cT z = w1 ϕ1 (xi ) + b1 + ei,1 , ∀i = 2, ..., N
i,1
T
cT z = w2 ϕ2 (xi ) + b2 + ei,2 , ∀i = 2, ..., N
i,2
T
Coordinates in low dimensional space: z = [z1; z2; ...; zN ] ∈ RdN
Models from data: a unifying picture - Johan Suykens 23
40. Kernel maps with reference point: formulation
• Kernel maps with reference point [Suykens, IEEE-TNN 2008]:
- LS-SVM core part: realize dimensionality reduction x → z
- Regularization term: (z − PD z)T (z − PD z) = N zi − N sij Dzj 2
P P
i=1 j=1 2
2 2
with D diagonal matrix and sij = exp(− xi − xj 2/σ )
- reference point q (e.g. first point; sacrificed in the visualization)
• Example: d = 2
N
1 ν η X 2
min T T T
(z − PD z) (z − PD z)+ (w1 w1 + w2 w2 ) + (ei,1 + e2 )
i,2
z,w1 ,w2 ,b1 ,b2 ,ei,1 ,ei,2 2 2 2 i=1
such that cT z = q1 + e1,1
1,1
cT z = q2 + e1,2
1,2
cT z = w1 ϕ1 (xi ) + b1 + ei,1 , ∀i = 2, ..., N
i,1
T
cT z = w2 ϕ2 (xi ) + b2 + ei,2 , ∀i = 2, ..., N
i,2
T
Coordinates in low dimensional space: z = [z1; z2; ...; zN ] ∈ RdN
Models from data: a unifying picture - Johan Suykens 23
41. Kernel maps: spiral example
0.5
0
x3
q = [+1; −1] q = [−1; −1]
−3 −3
x 10 x 10
12 12
−0.5
1
10 10
0.5 1
0 0.5
−0.5 0 8 8
−0.5
−1
−1
−1.5 −1.5
x 6 6
2 x1
z hat
z hat
2
2
4 4
2 2
0 0
−2 −2
−0.02 −0.015 −0.01 −0.005 0 0.005 0.01 −0.01 −0.005 0 0.005 0.01 0.015 0.02
z1hat z1hat
training data (blue *), validation data (magenta o), test data (red +)
2
ˆT ˆ
zi zj xT xj
i
Model selection: min ˆ ˆ
zi 2 zj 2 − xi 2 xj 2
i,j
Models from data: a unifying picture - Johan Suykens 24
42. Kernel maps: visualizing gene distribution
−3
x 10
2.1
2
1.9
3
z
1.8
1.7
2.3
2.2
−2.35 −2.3 2.1
−2.25 −2.2 −3
−2.15 −2.1 2 x 10
−2.05 −2 −1.95
1.9
−3
x 10 z2
z1
Alon colon cancer microarray data set: 3D projections
Dimension input space: 62
Number of genes: 1500 (training: 500, validation: 500, test: 500)
Models from data: a unifying picture - Johan Suykens 25
43. Kernels & Tensors
neuroscience: EEG data
(time samples × frequency × electrodes)
computer vision: image (/video) compression/completion/· · ·
(pixel × illumination × expression × · · ·)
web mining: analyze users behaviors
(users × queries × webpages)
vector x matrix X tensor X
- Naive kernel: K(X , Y) = exp(− 1 σ 2 vec(X ) − vec(Y) 2)
2 2
- Tensorial kernel exploiting structure: learning from few examples
[Signoretto et al., Neural Networks, 2011 & IEEE-TSP, 2012]
Models from data: a unifying picture - Johan Suykens 26
44. Tensor completion
Mass spectral imaging: sagittal section mouse brain [data: E. Waelkens, R. Van de Plas]
Tensor completion using nuclear norm regularization [Signoretto et al., IEEE-SPL, 2011]
Models from data: a unifying picture - Johan Suykens 27
46. Challenges for Computational Intelligence
- Bridging gaps between advanced methods and end-users
- New mathematical and methodological frameworks
- Scalable algorithms towards large and high-dimensional data
Models from data: a unifying picture - Johan Suykens 28
47. Acknowledgements
• Colleagues at ESAT-SCD (especially research units: systems, models,
control - biomedical data processing - bioinformatics):
C. Alzate, A. Argyriou, J. De Brabanter, K. De Brabanter, B. De Moor, M. Espinoza,
T. Falck, D. Geebelen, X. Huang, V. Jumutc, P. Karsmakers, R. Langone, J. Lopez,
J. Luts, R. Mall, S. Mehrkanoon, Y. Moreau, K. Pelckmans, J. Puertas, L. Shi, M.
Signoretto, V. Van Belle, R. Van de Plas, S. Van Huffel, J. Vandewalle, C. Varon, S.
Yu, and others
• Many people for joint work, discussions, invitations, organizations
• Support from ERC AdG A-DATADRIVE-B, KU Leuven, GOA-MaNet,
COE Optimization in Engineering OPTEC, IUAP DYSCO, FWO projects,
IWT, IBBT eHealth, COST
Models from data: a unifying picture - Johan Suykens 29