SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
Models from Data: a Unifying Picture
                       Johan Suykens

          KU Leuven, ESAT-SCD/SISTA
              Kasteelpark Arenberg 10
        B-3001 Leuven (Heverlee), Belgium
       Email: johan.suykens@esat.kuleuven.be
        http://www.esat.kuleuven.be/scd/

  Grand Challenges of Computational Intelligence
          Nicosia Cyprus, Sept. 14, 2012




          Models from data: a unifying picture - Johan Suykens
Overview



• Models from data

• Examples

• Challenges for computational intelligence
High-quality predictive models are crucial
                 biomedical




             bio-informatics


Models from data: a unifying picture - Johan Suykens          1
High-quality predictive models are crucial
                 biomedical                                     process industry
                                                       energy




             bio-informatics


Models from data: a unifying picture - Johan Suykens                               1
High-quality predictive models are crucial
                 biomedical                                                process industry
                                                       energy




                                            brain-computer interfaces   traffic networks


             bio-informatics


Models from data: a unifying picture - Johan Suykens                                          1
Classical Neural Networks
x1 w1
x2 w2                        y
   w           h(·)
x3 w 3
xn n
    b            h(·)
 1


Multilayer Perceptron (MLP) properties:
- Universal approximation
- Learning from input-output patterns: off-line & on-line
- Parallel network architecture, multiple inputs and outputs

  +     Flexible and widely applicable:
        Feedforward & recurrent networks, supervised & unsupervised learning

  -    Many local minima, trial and error for number of neurons

Models from data: a unifying picture - Johan Suykens                       2
Support Vector Machines
       cost function                                   cost function
                                  MLP                                  SVM




                                          weights                        weights

• Nonlinear classification and function estimation by convex optimization
• Learning and generalization in high dimensional input spaces
• Use of kernels:
  - linear, polynomial, RBF, MLP, splines, kernels from graphical models,...
  - application-specific kernels: e.g. bioinformatics, textmining

         [Vapnik, 1995; Sch¨lkopf & Smola, 2002; Shawe-Taylor & Cristianini, 2004]
                           o

Models from data: a unifying picture - Johan Suykens                                 3
Kernel-based models: different views

    SVM                                                 Some early history on RKHS:
                              LS−SVM
                                                        1910-1920: Moore
                                                        1940: Aronszajn
 Kriging                                  RKHS
                                                        1951: Krige
                                                        1970: Parzen
            Gaussian Processes                          1971: Kimeldorf & Wahba




               Complementary insights from different perspectives:
                  kernels are used in different methodologies
- Support vector machines (SVM):            optimization approach (primal/dual)
- Reproducing kernel Hilbert spaces (RKHS): variational problem, functional analysis
- Gaussian processes (GP):                  probabilistic/Bayesian approach

 Models from data: a unifying picture - Johan Suykens                                 4
SVMs: living in two worlds ...

                                              Primal space
                                                       Parametric


                                                       y = sign[wT ϕ(x) + b]
                                                       ˆ
                       ϕ(x)                                                          ϕ1 (x)
               x                                                                   w1
                                                                             ˆ
                                                                             y
       xx                                                                                      x
                   x                                                              wnh
       x                  x x                                                       ϕnh (x)
                             x
                          o    x                                    K(xi , xj ) = ϕ(xi )T ϕ(xj ) (“Kernel trick”)
                            o x
                   o      oo
           o
                                              Dual space
        o               Input space
               o                                       Nonparametric
                                                                         P#sv
                                                       y = sign[
                                                       ˆ                    i=1   αi yiK(x, xi ) + b]
                                                                                   K(x, x1 )
    Feature space                                                                  α1
                                                                             ˆ
                                                                             y                 x
                                                                                  α#sv
                                                                                  K(x, x#sv )


Models from data: a unifying picture - Johan Suykens                                                          5
SVMs: living in two worlds ...

                                              Primal space
                                                       Parametric




                                                                                                            Parametric
                                                       y = sign[wT ϕ(x) + b]
                                                       ˆ
                       ϕ(x)                                                          ϕ1 (x)
               x                                                                   w1
                                                                             ˆ
                                                                             y
       xx                                                                                      x
                   x                                                              wnh
       x                  x x                                                       ϕnh (x)
                             x
                          o    x                                    K(xi , xj ) = ϕ(xi )T ϕ(xj ) (“Kernel trick”)
                            o x
                   o      oo
           o
                                              Dual space
        o




                                                                                                            Non−parametric
               o        Input space                    Nonparametric
                                                                         P#sv
                                                       y = sign[
                                                       ˆ                    i=1   αi yiK(x, xi ) + b]
                                                                                   K(x, x1 )
    Feature space                                                                  α1
                                                                             ˆ
                                                                             y                 x
                                                                                  α#sv
                                                                                  K(x, x#sv )


Models from data: a unifying picture - Johan Suykens                                                                         5
Fitting models to data: alternative views

- Consider model y = f (x; w), given input/output data {(xi, yi)}N :
                 ˆ                                               i=1

                                                       N
                                  min wT w + γ               (yi − f (xi; w))2
                                    w
                                                       i=1



- Rewrite the problem as

                           minw,e   wT w + γ N (yi − f (xi; w))2
                                                 i=1
                         subject to ei = yi − f (xi; w), i = 1, ..., N



- Construct Lagrangian and take condition for optimality

- Express the solution and the model in terms of the Lagrange multipliers

Models from data: a unifying picture - Johan Suykens                             6
Fitting models to data: alternative views

- Consider model y = f (x; w), given input/output data {(xi, yi)}N :
                 ˆ                                               i=1

                                                       N
                                 min wT w + γ                (yi − f (xi; w))2
                                   w
                                                       i=1



- Rewrite the problem as

                                                               N
                               min             wT w + γ        i=1   e2
                                                                      i
                                w,e
                          subject to ei = yi − f (xi; w), i = 1, ..., N


- Construct Lagrangian and take conditions for optimality

- Express the solution and the model in terms of the Lagrange multipliers

Models from data: a unifying picture - Johan Suykens                             6
Linear model: solving in primal or dual?


inputs x ∈ Rd, output y ∈ R
training set {(xi, yi)}N
                       i=1




                                         (P ) :        y = wT x + b,
                                                       ˆ                  w ∈ Rd
                                 ր
                  Model
                                 ց
                                                                     T
                                         (D) : y =
                                               ˆ             i   αi xi x + b,   α ∈ RN




Models from data: a unifying picture - Johan Suykens                                     7
Linear model: solving in primal or dual?


inputs x ∈ Rd, output y ∈ R
training set {(xi, yi)}N
                       i=1




                                         (P ) :        y = wT x + b,
                                                       ˆ                  w ∈ Rd
                                 ր
                  Model
                                 ց
                                         (D) : y =
                                               ˆ             i   αi xT x + b,
                                                                     i          α ∈ RN




Models from data: a unifying picture - Johan Suykens                                     7
Linear model: solving in primal or dual?


few inputs, many data points: (e.g. 20 × 1.000.000)




  primal : w ∈ R20
dual: α ∈ R1.000.000 (kernel matrix: 1.000.000 × 1.000.000)




Models from data: a unifying picture - Johan Suykens          7
Linear model: solving in primal or dual?


many inputs, few data points: (e.g. 10.000 × 50)




primal: w ∈ R10.000
 dual : α ∈ R50 (kernel matrix: 50 × 50)


Models from data: a unifying picture - Johan Suykens         7
Least Squares Support Vector Machines: “core models”
• Regression

                   min wT w + γ                         e2 s.t. yi = wT ϕ(xi) + b + ei, ∀i
                                                         i
                   w,b,e
                                                i
• Classification
                 min wT w + γ                       e2 s.t. yi(wT ϕ(xi) + b) = 1 − ei, ∀i
                                                     i
                 w,b,e
                                            i
• Kernel pca (V = I), Kernel spectral clustering (V = D −1)

                    min −wT w + γ                           vie2 s.t. ei = wT ϕ(xi) + b, ∀i
                                                               i
                    w,b,e
                                                        i
• Kernel canonical correlation analysis/partial least squares

                     T           T                                2       ei = wT ϕ1(xi) + b
      min         w w+v v+ν                             (ei − ri) s.t.
  w,v,b,d,e,r                                                             ri = v T ϕ2(yi ) + d
                                                    i

        [Suykens & Vandewalle, 1999; Suykens et al., 2002; Alzate & Suykens, 2010]

Models from data: a unifying picture - Johan Suykens                                             8
Core models




                                                           regularization terms
                          Core model                   +
                                                           additional constraints




Models from data: a unifying picture - Johan Suykens                                9
Core models




                                                               regularization terms
                          Core model                   +
                                                               additional constraints




                                    optimal model representation


                                              model estimate




Models from data: a unifying picture - Johan Suykens                                    9
Core models

                        parametric model
                        support vector machine
                        least−squares support vector machine
                        Parzen kernel model




                                                               regularization terms
                          Core model                   +
                                                               additional constraints




                                    optimal model representation


                                              model estimate




Models from data: a unifying picture - Johan Suykens                                    9
Overview



• Models from data

• Examples

• Challenges for computational intelligence
Kernel spectral clustering
• Underlying model: e∗ = wT ϕ(x∗)
                       ˆ
  with q∗ = sign[ˆ∗] the estimated cluster indicator at any x∗ ∈ Rd.
       ˆ         e

• Primal problem: training on given data {xi }N
                                              i=1

                                                         N
                                           1         1
                                 min     − wT w + γ        vi e2
                                                               i
                                 w,e       2         2 i=1
                              subject to ei = wT ϕ(xi), i = 1, ..., N

    with weights vi (related to inverse degree matrix: V = D −1).
    Dual problem:
                                   Ωα = λDα
    with Ωij = K(xi, xj ) = ϕ(xi)T ϕ(xj ).

• Kernel spectral clustering [Alzate & Suykens, IEEE-PAMI, 2010], related to
  spectral clustering [Fiedler, 1973; Shi & Malik, 2000; Ng et al. 2002; Chung, 1997]

Models from data: a unifying picture - Johan Suykens                               10
Primal and dual model representations

bias term b for centering
k clusters, k − 1 sets of constraints (index l = 1, ..., k − 1)



                                                  (l)                   (l) T
                               (P ) :       sign[ˆ∗ ]
                                                 e           = sign[w           ϕ(x∗) + bl]
                       ր
               M
                       ց
                                                       (l)                      (l)
                               (D) : sign[ˆ∗ ] = sign[
                                          e                              j αj K(x∗ , xj ) + bl ]



Advantages:
- out-of-sample extensions
- model selection
- solving large scale problems

Models from data: a unifying picture - Johan Suykens                                               11
Out-of-sample extension and coding


        8                                                                        8

        6                                                                        6

        4                                                                        4

        2                                                                        2

        0                                                                        0
x(2)




                                                                         x(2)
       −2                                                                       −2

       −4                                                                       −4

       −6                                                                       −6

       −8                                                                       −8

       −10                                                                      −10
        −12     −10    −8     −6    −4     −2     0     2     4      6           −12   −10   −8   −6   −4     −2   0   2   4        6
                                         x(1)                                                               x(1)




              Models from data: a unifying picture - Johan Suykens                                                             12
Out-of-sample extension and coding


        8                                                                      8

        6                                                                      6

        4                                                                      4

        2                                                                      2

        0                                                                      0
x(2)




                                                                       x(2)
       −2                                                                     −2

       −4                                                                     −4

       −6                                                                     −6

       −8                                                                     −8

       −10                                                                    −10
        −12    −10    −8    −6    −4     −2     0     2      4     6           −12   −10   −8   −6   −4     −2   0   2   4        6
                                       x(1)                                                               x(1)




            Models from data: a unifying picture - Johan Suykens                                                             12
Example: image segmentation




                                                            4


                                                            3


                                                            2




                                                       i,val
                                                       (3)
                                                            1




                                                        e
                                                            0


                                                          −1


                                                          −2
                                                           3
                                                                2
                                                                                                                                    2
                                                                    1                                                           1
                                                                            0                                              0
                                                                                                                      −1
                                                                                −1                             −2
                                                                                     −2                  −3
                                                                        (2)               −3
                                                                                                    −4
                                                                                                              (1)
                                                                        e                      −5
                                                                                                          e
                                                                        i,val                                 i,val




Models from data: a unifying picture - Johan Suykens                                                                           13
Example: image segmentation

                                                                           1

                                                                   0.9

                                                                   0.8




                                                        Fisher criterion
                                                                   0.7

                                                                   0.6

                                                                   0.5

                                                                   0.4

                                                                   0.3

                                                                   0.2

                                                                   0.1

                                                                           0
                                                                            2       4      6     8      10          12          14
                                                                                        Number of clusters k
                                                                           0.1


                                                                      0.05


                                                                               0




                                                        αi,val
                                                                  −0.05




                                                       (2)
                                                                      −0.1


                                                                  −0.15


                                                                      −0.2


                                                                  −0.25
                                                                    −0.04          −0.02   0      (1)0.04
                                                                                               0.02          0.06        0.08        0.1
                                                                                               αi,val



Models from data: a unifying picture - Johan Suykens                                                                                       14
Kernel spectral clustering: sparse kernel models
       original image                                               binary clustering




Incomplete Cholesky decomposition: Ω − GGT 2 ≤ η with G ∈ RN ×R and R ≪ N
        Image (Berkeley image dataset): 321 × 481 (154, 401 pixels), 175 SV
                                     (l)                 (l)
                                   e∗ =           i∈SSV αi K(xi , x∗ ) + bl


Models from data: a unifying picture - Johan Suykens                                    15
Kernel spectral clustering: sparse kernel models
       original image                                              sparse kernel model




Incomplete Cholesky decomposition: Ω − GGT 2 ≤ η with G ∈ RN ×R and R ≪ N
        Image (Berkeley image dataset): 321 × 481 (154, 401 pixels), 175 SV
                                     (l)                 (l)
                                   e∗ =           i∈SSV αi K(xi , x∗ ) + bl


Models from data: a unifying picture - Johan Suykens                                     15
Highly sparse kernel models: image segmentation
                                 *                       * *                                                              *               *                   *
         * *                                            ***                                                                           *             *
            *                             *                                                                                                   *             ** *
           **                             * * **
                                              *                                                           *                                   *
                     *                                                          *           *                 *       *   * *
                     *               *                                                                            *
                                                   **          *                *                                         *                  *                   *
                     *           *                             *                                                                                     *
                                                  *                                                                                                               *
                                                                                                                                                                  *
                        *                **   *                                                               * * ** *                         *                  *
        *                                                                                                         * * ** * *                * * *
        *                *                                                                                                                       * * * *
                        *            **                                                                        *         *      **                **            *
                       *                     *         * *                                        * *
                                    *
                                         *        * *** **                                                                 *        * * *  *                *
                        *          *      *               *                                            *        * * **
                                                                                                                    **
                      *                             * ***                                                                                                         *
                  **      *                         * * *                                   *        *
                                                                                                              *
                                                                                                            ** *                  * *  *                    * *
                                                             ** *             *                                                                             *
                                     *      * *              **
                                                              **                *                           * * * *                      *                     *
                         *            * *                    **                                               * * *
                                                                                                                 ** * *       *                            *
               *
                     *                            *             *                                      * * * * * ** * * * *
                                                                                                                    * *                                    *
                                                           *                                           * * * ** * *
                                                 *          *                                                    * **
                                                                                                                   **        * * *                                 *
                     *                                            *
                                                                 * *                                              ** * *
                                                                                                                   * ***                       *       * *
                                                                                                                                                             *
                                                                     *                        *                        *                       *
                  *         *                   ** * * *** * * *                                              * * * ** *         ** *
                                                                                                            * ** * *
                                                                                                                   *               **
                                       *          * * ** * * * * *
                                                              *             * *                        *     * * * ** *
                                                                                                               * *** *            *
                                                    **          * * **             *                 * *        * * ** * * * * **
                                                                                                                        *         *
                                    *                *                             **                                                               *          *
                                              *
                                                    *                   **                         *               ** * * * * *                      *
                             * *                       * * **                         *
                                                                                                  **
                                                                                                  *       *            * *
                                                             * * *     *              *     * **                       *
                                                                                                                        * *
                                                                                                                                   * **
                                                                                                                                     *                                        0.5
                                                                       *                                              * **
              *                   *                               ***
                                                                 * * * **                 * * **                                                         *
                       *                                                                        **                   * ** * * *
                                                                                                                            *
                  *              *      *                ****                   *               *
                                                                                                           * * *
                                *            *                             ** **                   *
                                                                                            * ** * ** *** * * * * **
                                                                                                          ** *
                                                                                                                           *
                                                                                                                                                          ** *                 0
                                           *
                                           *           *          * ** *               ** *                  * *                       *** *
                                              * * * *                                                           * * ** * * ** * *
                                                                                                                 *
              * *                                                              **                                      **            * **           * *
                                                 * ** ** *               *                   * *                                       *               *
                                                                                                                                                                             −0.5
            *                        *       ** *           * * *                 * **               * *
                                                                                                                    ** **
                                                                                                                          *         **
                                                                                                                                       *
                       *                                                        *                                                                              *
                 *                                            **                                                         ** *        *
                                              * ** * * * * *                             *                                  *
                                                                                                                            * * **




                                                                                                                                                                       (3)
                                       *           * * *                                  *          *                     **            *                                    −1
                                                * *                 *         ** *                                     * * * * ** * *
                  *                                            * *                  *
                                   * * * * * * ** ** *                                                                      * **            *




                                                                                                                                                                        ei
                *                                                                                                           * *              *
                           *                         *                                *                                                                                      −1.5

                                                                                                                                                                              −2

                                                                                                                                                                             −2.5

                                                                                                                                                                              −3
                                                                                                                                                                               2
                                                                                                                                                                                    1                                                   0.5
                                                                                                                                                                                        0                                           0
                                                                                                                                                                                             −1                              −0.5
                                                                                                                                                                                                  −2               −1
                                                                                                                                                                                            (2)        −3   −1.5         (1)
                                                                                                                                                                                        ei                              ei




Models from data: a unifying picture - Johan Suykens                                                                                                                                                                                          16
Highly sparse kernel models: image segmentation
                                                  *
                     *
                                      *
                 *


                     *                    *
                             *                                0.5

                                                               0
                         *                                   −0.5

                                 **




                                                       (3)
                                                              −1
                                              *




                                                        ei
                                  *                          −1.5

                                                              −2

                                                             −2.5

                                                              −3
                                                               2
                                                                    1                                                   0.5
                                                                        0                                           0
                                                                             −1                              −0.5
                                                                                  −2               −1
                                                                            (2)        −3   −1.5         (1)
                                                                        ei                              ei




       only 3k = 12 support vectors [Alzate & Suykens, Neurocomputing, 2011]

Models from data: a unifying picture - Johan Suykens                                                                          16
Kernel spectral clustering: adding prior knowledge

• Pair of points x†, x‡: c = 1 must-link, c = −1 cannot-link

• Primal problem [Alzate & Suykens, IJCNN 2009]

                                                  k−1                             k−1
                                            1               (l)   T
                                                                       (l) 1              (l)T
                          min             −             w             w +               γle      D −1e(l)
                      w(l) ,e(l) ,bl        2                              2
                                                  l=1                             l=1
                                            (1)                         (1)
                     subject to e = ΦN ×nh w + b11N
                                .
                                .
                                e(k−1) = ΦN ×nh w(k−1) + bk−11N
                                                  T                           T
                                          w(1) ϕ(x†) = cw(1) ϕ(x‡)
                                          .
                                          .
                                                T               T
                                          w(k−1) ϕ(x†) = cw(k−1) ϕ(x‡)


• Dual problem: yields rank-one downdate of the kernel matrix

Models from data: a unifying picture - Johan Suykens                                                        17
Kernel spectral clustering: example


             original image                            without constraints




Models from data: a unifying picture - Johan Suykens                         18
Kernel spectral clustering: example


             original image                            with constraints




Models from data: a unifying picture - Johan Suykens                      19
Hierarchical kernel spectral clustering




Hierarchical kernel spectral clustering:
- looking at different scales
- use of model selection and validation data

[Alzate & Suykens, Neural Networks, 2012]




Models from data: a unifying picture - Johan Suykens           20
Power grid: kernel spectral clustering of time-series
                   1                                                              1                                                 1

                  0.9                                                            0.9                                               0.9

                  0.8                                                            0.8                                               0.8
normalized load




                                                               normalized load




                                                                                                                 normalized load
                  0.7                                                            0.7                                               0.7

                  0.6                                                            0.6                                               0.6

                  0.5                                                            0.5                                               0.5

                  0.4                                                            0.4                                               0.4

                  0.3                                                            0.3                                               0.3

                  0.2                                                            0.2                                               0.2

                  0.1                                                            0.1                                               0.1

                   0                                                              0                                                 0
                          5       10          15   20                                  5   10          15   20                           5   10          15    20

                                       hour                                                     hour                                              hour

                        Electricity load: 245 substations in Belgian grid (1/2 train, 1/2 validation)
                        xi ∈ R43.824: spectral clustering on high dimensional data (5 years)
                        3 of 7 detected clusters:
                        - 1: Residential profile: morning and evening peaks
                        - 2: Business profile: peaked around noon
                        - 3: Industrial profile: increasing morning, oscillating afternoon and evening

                                                        [Alzate, Espinoza, De Moor, Suykens, 2009]

                        Models from data: a unifying picture - Johan Suykens                                                                                  21
Dimensionality reduction and data visualization



• Traditionally:
  commonly used techniques are e.g. principal component analysis (PCA),
  multi-dimensional scaling (MDS), self-organizing maps (SOM)

• More recently:
  isomap, locally linear embedding (LLE), Hessian locally linear embedding,
  diffusion maps, Laplacian eigenmaps
  (“kernel eigenmap methods and manifold learning”)
    [Roweis & Saul, 2000; Coifman et al., 2005; Belkin et al., 2006]


• Kernel maps with reference point [Suykens, IEEE-TNN 2008]:
  data visualization and dimensionality reduction by solving linear system




Models from data: a unifying picture - Johan Suykens                     22
Kernel maps with reference point: formulation

        • Kernel maps with reference point [Suykens, IEEE-TNN 2008]:
          - LS-SVM core part: realize dimensionality reduction x → z
          - Regularization term: (z − PD z)T (z − PD z) = N zi − N sij Dzj                      2
                                                           P           P
                                                              i=1         j=1                   2
                                                                  2  2
            with D diagonal matrix and sij = exp(− xi − xj 2/σ )
          - reference point q (e.g. first point; sacrificed in the visualization)
        • Example: d = 2

                                                                                  N
                               1                          ν                    η X 2
          min                              T                    T       T
                                 (z − PD z) (z − PD z)+ (w1 w1 + w2 w2 ) +           (ei,1 + e2 )
                                                                                              i,2
z,w1 ,w2 ,b1 ,b2 ,ei,1 ,ei,2   2                          2                    2 i=1
           such that           cT z = q1 + e1,1
                                1,1
                                T
                               c1,2z = q2 + e1,2
                               cT z = w1 ϕ1 (xi ) + b1 + ei,1 , ∀i = 2, ..., N
                                i,1
                                         T

                               cT z = w2 ϕ2 (xi ) + b2 + ei,2 , ∀i = 2, ..., N
                                i,2
                                         T




            Coordinates in low dimensional space: z = [z1; z2; ...; zN ] ∈ RdN

       Models from data: a unifying picture - Johan Suykens                                    23
Kernel maps with reference point: formulation

        • Kernel maps with reference point [Suykens, IEEE-TNN 2008]:
          - LS-SVM core part: realize dimensionality reduction x → z
                                             T              PN         PN                       2
          - Regularization term: (z − PD z) (z − PD z) = i=1 zi − j=1 sij Dzj                   2
            with D diagonal matrix and sij = exp(− xi − xj 2/σ 2)2
          - reference point q (e.g. first point; sacrificed in the visualization)
        • Example: d = 2

                                                                                  N
                               1                          ν                    η X 2
          min                              T                    T       T
                                 (z − PD z) (z − PD z)+ (w1 w1 + w2 w2 ) +           (ei,1 + e2 )
                                                                                              i,2
z,w1 ,w2 ,b1 ,b2 ,ei,1 ,ei,2   2                          2                    2 i=1
           such that           cT z = q1 + e1,1
                                1,1
                                T
                               c1,2z = q2 + e1,2
                               cT z = w1 ϕ1 (xi ) + b1 + ei,1 , ∀i = 2, ..., N
                                i,1
                                         T

                               cT z = w2 ϕ2 (xi ) + b2 + ei,2 , ∀i = 2, ..., N
                                i,2
                                         T




            Coordinates in low dimensional space: z = [z1; z2; ...; zN ] ∈ RdN

       Models from data: a unifying picture - Johan Suykens                                    23
Kernel maps with reference point: formulation

        • Kernel maps with reference point [Suykens, IEEE-TNN 2008]:
          - LS-SVM core part: realize dimensionality reduction x → z
          - Regularization term: (z − PD z)T (z − PD z) = N zi − N sij Dzj                      2
                                                          P          P
                                                            i=1         j=1                     2
                                                               2   2
            with D diagonal matrix and sij = exp(− xi − xj 2/σ )
          - reference point q (e.g. first point; sacrificed in the visualization)
        • Example: d = 2

                                                                                  N
                               1                          ν                    η X 2
          min                              T                    T       T
                                 (z − PD z) (z − PD z)+ (w1 w1 + w2 w2 ) +           (ei,1 + e2 )
                                                                                              i,2
z,w1 ,w2 ,b1 ,b2 ,ei,1 ,ei,2   2                          2                    2 i=1
           such that           cT z = q1 + e1,1
                                1,1
                               cT z = q2 + e1,2
                                1,2
                               cT z = w1 ϕ1 (xi ) + b1 + ei,1 , ∀i = 2, ..., N
                                i,1
                                        T

                               cT z = w2 ϕ2 (xi ) + b2 + ei,2 , ∀i = 2, ..., N
                                i,2
                                        T




            Coordinates in low dimensional space: z = [z1; z2; ...; zN ] ∈ RdN

       Models from data: a unifying picture - Johan Suykens                                    23
Kernel maps: spiral example


      0.5




       0
x3




                                                                                                  q = [+1; −1]                                                         q = [−1; −1]
                                                                                          −3                                                                  −3
                                                                                       x 10                                                                x 10
                                                                                  12                                                                  12

     −0.5
        1
                                                                                  10                                                                  10
            0.5                                                       1
                  0                                             0.5
                      −0.5                                  0                      8                                                                   8
                                                     −0.5
                             −1
                                                −1
                                  −1.5   −1.5
                      x                                                            6                                                                   6
                      2                               x1




                                                                          z hat




                                                                                                                                              z hat
                                                                             2




                                                                                                                                                 2
                                                                                   4                                                                   4




                                                                                   2                                                                   2




                                                                                   0                                                                   0




                                                                                  −2                                                                  −2
                                                                                  −0.02        −0.015   −0.01    −0.005   0   0.005   0.01            −0.01        −0.005   0   0.005   0.01   0.015   0.02
                                                                                                                  z1hat                                                         z1hat




                                  training data (blue *), validation data (magenta o), test data (red +)
                                                                                                                                                                   2
                                                                                                                  ˆT ˆ
                                                                                                                  zi zj                  xT xj
                                                                                                                                          i
                                            Model selection: min                                                ˆ     ˆ
                                                                                                                zi 2 zj 2      −       xi 2 xj 2
                                                                                               i,j


            Models from data: a unifying picture - Johan Suykens                                                                                                                        24
Kernel maps: visualizing gene distribution


                                        −3
                                     x 10



                               2.1



                                2



                               1.9
                           3
                          z




                               1.8



                               1.7
                                                                                                           2.3
                                                                                                     2.2
                                 −2.35 −2.3                                                    2.1
                                            −2.25 −2.2                                                           −3
                                                       −2.15 −2.1                          2               x 10
                                                                  −2.05   −2 −1.95
                                                                                     1.9
                                             −3
                                        x 10                                                   z2
                                                                    z1



           Alon colon cancer microarray data set: 3D projections
                         Dimension input space: 62
       Number of genes: 1500 (training: 500, validation: 500, test: 500)


Models from data: a unifying picture - Johan Suykens                                                                  25
Kernels & Tensors
             neuroscience:             EEG data
                                       (time samples × frequency × electrodes)
        computer vision:               image (/video) compression/completion/· · ·
                                       (pixel × illumination × expression × · · ·)
               web mining:             analyze users behaviors
                                       (users × queries × webpages)




                    vector x                  matrix X          tensor X

- Naive kernel: K(X , Y) = exp(− 1 σ 2 vec(X ) − vec(Y) 2)
                                    2                      2
- Tensorial kernel exploiting structure: learning from few examples
                   [Signoretto et al., Neural Networks, 2011 & IEEE-TSP, 2012]

Models from data: a unifying picture - Johan Suykens                                 26
Tensor completion




Mass spectral imaging: sagittal section mouse brain [data: E. Waelkens, R. Van de Plas]
Tensor completion using nuclear norm regularization [Signoretto et al., IEEE-SPL, 2011]

Models from data: a unifying picture - Johan Suykens                                  27
Overview



• Models from data

• Examples

• Challenges for computational intelligence
Challenges for Computational Intelligence



- Bridging gaps between advanced methods and end-users



- New mathematical and methodological frameworks



- Scalable algorithms towards large and high-dimensional data




Models from data: a unifying picture - Johan Suykens            28
Acknowledgements

• Colleagues at ESAT-SCD (especially research units: systems, models,
  control - biomedical data processing - bioinformatics):
    C. Alzate, A. Argyriou, J. De Brabanter, K. De Brabanter, B. De Moor, M. Espinoza,
    T. Falck, D. Geebelen, X. Huang, V. Jumutc, P. Karsmakers, R. Langone, J. Lopez,
    J. Luts, R. Mall, S. Mehrkanoon, Y. Moreau, K. Pelckmans, J. Puertas, L. Shi, M.
    Signoretto, V. Van Belle, R. Van de Plas, S. Van Huffel, J. Vandewalle, C. Varon, S.
    Yu, and others

• Many people for joint work, discussions, invitations, organizations
• Support from ERC AdG A-DATADRIVE-B, KU Leuven, GOA-MaNet,
  COE Optimization in Engineering OPTEC, IUAP DYSCO, FWO projects,
  IWT, IBBT eHealth, COST




Models from data: a unifying picture - Johan Suykens                                 29

Mais conteúdo relacionado

Mais procurados

Lesson 12: Linear Approximation
Lesson 12: Linear ApproximationLesson 12: Linear Approximation
Lesson 12: Linear ApproximationMatthew Leingang
 
A Knapsack Approach to Sensor-Mission Assignment with Uncertain Demands
A Knapsack Approach to Sensor-Mission Assignment with Uncertain DemandsA Knapsack Approach to Sensor-Mission Assignment with Uncertain Demands
A Knapsack Approach to Sensor-Mission Assignment with Uncertain DemandsDiego Pizzocaro
 
Discrete Models in Computer Vision
Discrete Models in Computer VisionDiscrete Models in Computer Vision
Discrete Models in Computer VisionYap Wooi Hen
 
20090411
2009041120090411
20090411xoanon
 
Camera calibration
Camera calibrationCamera calibration
Camera calibrationYuji Oyamada
 
Lecture1
Lecture1Lecture1
Lecture1rjaeh
 
Bayesian Methods for Machine Learning
Bayesian Methods for Machine LearningBayesian Methods for Machine Learning
Bayesian Methods for Machine Learningbutest
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfgrssieee
 
Surrogate models for uncertainty quantification and reliability analysis
Surrogate models for uncertainty quantification and reliability analysisSurrogate models for uncertainty quantification and reliability analysis
Surrogate models for uncertainty quantification and reliability analysisBruno Sudret
 
Fcv hum mach_geman
Fcv hum mach_gemanFcv hum mach_geman
Fcv hum mach_gemanzukun
 
Lesson 25: The Definite Integral
Lesson 25: The Definite IntegralLesson 25: The Definite Integral
Lesson 25: The Definite IntegralMatthew Leingang
 
Lesson 8: Basic Differentiation Rules (Section 21 slides)
Lesson 8: Basic Differentiation Rules (Section 21 slides)Lesson 8: Basic Differentiation Rules (Section 21 slides)
Lesson 8: Basic Differentiation Rules (Section 21 slides)Matthew Leingang
 
04 structured support vector machine
04 structured support vector machine04 structured support vector machine
04 structured support vector machinezukun
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Stability of adaptive random-walk Metropolis algorithms
Stability of adaptive random-walk Metropolis algorithmsStability of adaptive random-walk Metropolis algorithms
Stability of adaptive random-walk Metropolis algorithmsBigMC
 
Lesson 23: The Definite Integral (slides)
Lesson 23: The Definite Integral (slides)Lesson 23: The Definite Integral (slides)
Lesson 23: The Definite Integral (slides)Matthew Leingang
 

Mais procurados (19)

Lesson 12: Linear Approximation
Lesson 12: Linear ApproximationLesson 12: Linear Approximation
Lesson 12: Linear Approximation
 
A Knapsack Approach to Sensor-Mission Assignment with Uncertain Demands
A Knapsack Approach to Sensor-Mission Assignment with Uncertain DemandsA Knapsack Approach to Sensor-Mission Assignment with Uncertain Demands
A Knapsack Approach to Sensor-Mission Assignment with Uncertain Demands
 
Discrete Models in Computer Vision
Discrete Models in Computer VisionDiscrete Models in Computer Vision
Discrete Models in Computer Vision
 
20090411
2009041120090411
20090411
 
Camera calibration
Camera calibrationCamera calibration
Camera calibration
 
Lecture1
Lecture1Lecture1
Lecture1
 
Ben Gal
Ben Gal Ben Gal
Ben Gal
 
Bayesian Methods for Machine Learning
Bayesian Methods for Machine LearningBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning
 
M16302
M16302M16302
M16302
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdf
 
Surrogate models for uncertainty quantification and reliability analysis
Surrogate models for uncertainty quantification and reliability analysisSurrogate models for uncertainty quantification and reliability analysis
Surrogate models for uncertainty quantification and reliability analysis
 
Fcv hum mach_geman
Fcv hum mach_gemanFcv hum mach_geman
Fcv hum mach_geman
 
Lesson 25: The Definite Integral
Lesson 25: The Definite IntegralLesson 25: The Definite Integral
Lesson 25: The Definite Integral
 
Lesson 8: Basic Differentiation Rules (Section 21 slides)
Lesson 8: Basic Differentiation Rules (Section 21 slides)Lesson 8: Basic Differentiation Rules (Section 21 slides)
Lesson 8: Basic Differentiation Rules (Section 21 slides)
 
04 structured support vector machine
04 structured support vector machine04 structured support vector machine
04 structured support vector machine
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Stability of adaptive random-walk Metropolis algorithms
Stability of adaptive random-walk Metropolis algorithmsStability of adaptive random-walk Metropolis algorithms
Stability of adaptive random-walk Metropolis algorithms
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Lesson 23: The Definite Integral (slides)
Lesson 23: The Definite Integral (slides)Lesson 23: The Definite Integral (slides)
Lesson 23: The Definite Integral (slides)
 

Destaque

The Write Connection
The Write ConnectionThe Write Connection
The Write Connectionantheayining
 
Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"ieee_cis_cyprus
 
10erobbins ppt18 -_r
10erobbins ppt18 -_r10erobbins ppt18 -_r
10erobbins ppt18 -_rosamamallick
 
The Write Connection
The Write ConnectionThe Write Connection
The Write Connectionantheayining
 
Hisao Ishibuchi: "Scalability Improvement of Genetics-Based Machine Learning ...
Hisao Ishibuchi: "Scalability Improvement of Genetics-Based Machine Learning ...Hisao Ishibuchi: "Scalability Improvement of Genetics-Based Machine Learning ...
Hisao Ishibuchi: "Scalability Improvement of Genetics-Based Machine Learning ...ieee_cis_cyprus
 
Piero Bonissone: "Analytics, Cloud-Computing, and Crowdsourcing --- or How To...
Piero Bonissone: "Analytics, Cloud-Computing, and Crowdsourcing --- or How To...Piero Bonissone: "Analytics, Cloud-Computing, and Crowdsourcing --- or How To...
Piero Bonissone: "Analytics, Cloud-Computing, and Crowdsourcing --- or How To...ieee_cis_cyprus
 
Jennie Si: "Computing with Neural Spikes"
Jennie Si: "Computing with Neural Spikes" Jennie Si: "Computing with Neural Spikes"
Jennie Si: "Computing with Neural Spikes" ieee_cis_cyprus
 
Diagnostico en endodoncia
Diagnostico en endodonciaDiagnostico en endodoncia
Diagnostico en endodonciaAnita Minaya
 
Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble"
Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble" Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble"
Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble" ieee_cis_cyprus
 

Destaque (14)

The Write Connection
The Write ConnectionThe Write Connection
The Write Connection
 
Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"
 
10erobbins ppt18 -_r
10erobbins ppt18 -_r10erobbins ppt18 -_r
10erobbins ppt18 -_r
 
คู่มือพัสดุ
คู่มือพัสดุคู่มือพัสดุ
คู่มือพัสดุ
 
Application case
Application caseApplication case
Application case
 
The Write Connection
The Write ConnectionThe Write Connection
The Write Connection
 
Hisao Ishibuchi: "Scalability Improvement of Genetics-Based Machine Learning ...
Hisao Ishibuchi: "Scalability Improvement of Genetics-Based Machine Learning ...Hisao Ishibuchi: "Scalability Improvement of Genetics-Based Machine Learning ...
Hisao Ishibuchi: "Scalability Improvement of Genetics-Based Machine Learning ...
 
Piero Bonissone: "Analytics, Cloud-Computing, and Crowdsourcing --- or How To...
Piero Bonissone: "Analytics, Cloud-Computing, and Crowdsourcing --- or How To...Piero Bonissone: "Analytics, Cloud-Computing, and Crowdsourcing --- or How To...
Piero Bonissone: "Analytics, Cloud-Computing, and Crowdsourcing --- or How To...
 
Faiza varaibles
Faiza varaiblesFaiza varaibles
Faiza varaibles
 
Jennie Si: "Computing with Neural Spikes"
Jennie Si: "Computing with Neural Spikes" Jennie Si: "Computing with Neural Spikes"
Jennie Si: "Computing with Neural Spikes"
 
Diagnostico en endodoncia
Diagnostico en endodonciaDiagnostico en endodoncia
Diagnostico en endodoncia
 
Palm m3 chapter1b
Palm m3 chapter1bPalm m3 chapter1b
Palm m3 chapter1b
 
Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble"
Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble" Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble"
Gary Yen: "Multi-objective Optimization and Performance Metrics Ensemble"
 
Ppt eptik
Ppt eptikPpt eptik
Ppt eptik
 

Semelhante a Johan Suykens: "Models from Data: a Unifying Picture"

Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Beniamino Murgante
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningkkkc
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論岳華 杜
 
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...ICAC09
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networksm.a.kirn
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingSSA KPI
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
New Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial SectorNew Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial SectorSSA KPI
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsChristian Robert
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Cheng Feng
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...United States Air Force Academy
 
NN_02_Threshold_Logic_Units.pdf
NN_02_Threshold_Logic_Units.pdfNN_02_Threshold_Logic_Units.pdf
NN_02_Threshold_Logic_Units.pdfchiron1988
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber SecurityAltoros
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2zukun
 
A probabilistic model for recursive factorized image features ppt
A probabilistic model for recursive factorized image features pptA probabilistic model for recursive factorized image features ppt
A probabilistic model for recursive factorized image features pptirisshicat
 

Semelhante a Johan Suykens: "Models from Data: a Unifying Picture" (20)

Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
New Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial SectorNew Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial Sector
 
A bit about мcmc
A bit about мcmcA bit about мcmc
A bit about мcmc
 
Svm dbeth
Svm dbethSvm dbeth
Svm dbeth
 
Introduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methodsIntroduction to advanced Monte Carlo methods
Introduction to advanced Monte Carlo methods
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Bn
BnBn
Bn
 
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
 
NN_02_Threshold_Logic_Units.pdf
NN_02_Threshold_Logic_Units.pdfNN_02_Threshold_Logic_Units.pdf
NN_02_Threshold_Logic_Units.pdf
 
Deep Learning for Cyber Security
Deep Learning for Cyber SecurityDeep Learning for Cyber Security
Deep Learning for Cyber Security
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2
 
A probabilistic model for recursive factorized image features ppt
A probabilistic model for recursive factorized image features pptA probabilistic model for recursive factorized image features ppt
A probabilistic model for recursive factorized image features ppt
 

Johan Suykens: "Models from Data: a Unifying Picture"

  • 1. Models from Data: a Unifying Picture Johan Suykens KU Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee), Belgium Email: johan.suykens@esat.kuleuven.be http://www.esat.kuleuven.be/scd/ Grand Challenges of Computational Intelligence Nicosia Cyprus, Sept. 14, 2012 Models from data: a unifying picture - Johan Suykens
  • 2. Overview • Models from data • Examples • Challenges for computational intelligence
  • 3. High-quality predictive models are crucial biomedical bio-informatics Models from data: a unifying picture - Johan Suykens 1
  • 4. High-quality predictive models are crucial biomedical process industry energy bio-informatics Models from data: a unifying picture - Johan Suykens 1
  • 5. High-quality predictive models are crucial biomedical process industry energy brain-computer interfaces traffic networks bio-informatics Models from data: a unifying picture - Johan Suykens 1
  • 6. Classical Neural Networks x1 w1 x2 w2 y w h(·) x3 w 3 xn n b h(·) 1 Multilayer Perceptron (MLP) properties: - Universal approximation - Learning from input-output patterns: off-line & on-line - Parallel network architecture, multiple inputs and outputs + Flexible and widely applicable: Feedforward & recurrent networks, supervised & unsupervised learning - Many local minima, trial and error for number of neurons Models from data: a unifying picture - Johan Suykens 2
  • 7. Support Vector Machines cost function cost function MLP SVM weights weights • Nonlinear classification and function estimation by convex optimization • Learning and generalization in high dimensional input spaces • Use of kernels: - linear, polynomial, RBF, MLP, splines, kernels from graphical models,... - application-specific kernels: e.g. bioinformatics, textmining [Vapnik, 1995; Sch¨lkopf & Smola, 2002; Shawe-Taylor & Cristianini, 2004] o Models from data: a unifying picture - Johan Suykens 3
  • 8. Kernel-based models: different views SVM Some early history on RKHS: LS−SVM 1910-1920: Moore 1940: Aronszajn Kriging RKHS 1951: Krige 1970: Parzen Gaussian Processes 1971: Kimeldorf & Wahba Complementary insights from different perspectives: kernels are used in different methodologies - Support vector machines (SVM): optimization approach (primal/dual) - Reproducing kernel Hilbert spaces (RKHS): variational problem, functional analysis - Gaussian processes (GP): probabilistic/Bayesian approach Models from data: a unifying picture - Johan Suykens 4
  • 9. SVMs: living in two worlds ... Primal space Parametric y = sign[wT ϕ(x) + b] ˆ ϕ(x) ϕ1 (x) x w1 ˆ y xx x x wnh x x x ϕnh (x) x o x K(xi , xj ) = ϕ(xi )T ϕ(xj ) (“Kernel trick”) o x o oo o Dual space o Input space o Nonparametric P#sv y = sign[ ˆ i=1 αi yiK(x, xi ) + b] K(x, x1 ) Feature space α1 ˆ y x α#sv K(x, x#sv ) Models from data: a unifying picture - Johan Suykens 5
  • 10. SVMs: living in two worlds ... Primal space Parametric Parametric y = sign[wT ϕ(x) + b] ˆ ϕ(x) ϕ1 (x) x w1 ˆ y xx x x wnh x x x ϕnh (x) x o x K(xi , xj ) = ϕ(xi )T ϕ(xj ) (“Kernel trick”) o x o oo o Dual space o Non−parametric o Input space Nonparametric P#sv y = sign[ ˆ i=1 αi yiK(x, xi ) + b] K(x, x1 ) Feature space α1 ˆ y x α#sv K(x, x#sv ) Models from data: a unifying picture - Johan Suykens 5
  • 11. Fitting models to data: alternative views - Consider model y = f (x; w), given input/output data {(xi, yi)}N : ˆ i=1 N min wT w + γ (yi − f (xi; w))2 w i=1 - Rewrite the problem as minw,e wT w + γ N (yi − f (xi; w))2 i=1 subject to ei = yi − f (xi; w), i = 1, ..., N - Construct Lagrangian and take condition for optimality - Express the solution and the model in terms of the Lagrange multipliers Models from data: a unifying picture - Johan Suykens 6
  • 12. Fitting models to data: alternative views - Consider model y = f (x; w), given input/output data {(xi, yi)}N : ˆ i=1 N min wT w + γ (yi − f (xi; w))2 w i=1 - Rewrite the problem as N min wT w + γ i=1 e2 i w,e subject to ei = yi − f (xi; w), i = 1, ..., N - Construct Lagrangian and take conditions for optimality - Express the solution and the model in terms of the Lagrange multipliers Models from data: a unifying picture - Johan Suykens 6
  • 13. Linear model: solving in primal or dual? inputs x ∈ Rd, output y ∈ R training set {(xi, yi)}N i=1 (P ) : y = wT x + b, ˆ w ∈ Rd ր Model ց T (D) : y = ˆ i αi xi x + b, α ∈ RN Models from data: a unifying picture - Johan Suykens 7
  • 14. Linear model: solving in primal or dual? inputs x ∈ Rd, output y ∈ R training set {(xi, yi)}N i=1 (P ) : y = wT x + b, ˆ w ∈ Rd ր Model ց (D) : y = ˆ i αi xT x + b, i α ∈ RN Models from data: a unifying picture - Johan Suykens 7
  • 15. Linear model: solving in primal or dual? few inputs, many data points: (e.g. 20 × 1.000.000) primal : w ∈ R20 dual: α ∈ R1.000.000 (kernel matrix: 1.000.000 × 1.000.000) Models from data: a unifying picture - Johan Suykens 7
  • 16. Linear model: solving in primal or dual? many inputs, few data points: (e.g. 10.000 × 50) primal: w ∈ R10.000 dual : α ∈ R50 (kernel matrix: 50 × 50) Models from data: a unifying picture - Johan Suykens 7
  • 17. Least Squares Support Vector Machines: “core models” • Regression min wT w + γ e2 s.t. yi = wT ϕ(xi) + b + ei, ∀i i w,b,e i • Classification min wT w + γ e2 s.t. yi(wT ϕ(xi) + b) = 1 − ei, ∀i i w,b,e i • Kernel pca (V = I), Kernel spectral clustering (V = D −1) min −wT w + γ vie2 s.t. ei = wT ϕ(xi) + b, ∀i i w,b,e i • Kernel canonical correlation analysis/partial least squares T T 2 ei = wT ϕ1(xi) + b min w w+v v+ν (ei − ri) s.t. w,v,b,d,e,r ri = v T ϕ2(yi ) + d i [Suykens & Vandewalle, 1999; Suykens et al., 2002; Alzate & Suykens, 2010] Models from data: a unifying picture - Johan Suykens 8
  • 18. Core models regularization terms Core model + additional constraints Models from data: a unifying picture - Johan Suykens 9
  • 19. Core models regularization terms Core model + additional constraints optimal model representation model estimate Models from data: a unifying picture - Johan Suykens 9
  • 20. Core models parametric model support vector machine least−squares support vector machine Parzen kernel model regularization terms Core model + additional constraints optimal model representation model estimate Models from data: a unifying picture - Johan Suykens 9
  • 21. Overview • Models from data • Examples • Challenges for computational intelligence
  • 22. Kernel spectral clustering • Underlying model: e∗ = wT ϕ(x∗) ˆ with q∗ = sign[ˆ∗] the estimated cluster indicator at any x∗ ∈ Rd. ˆ e • Primal problem: training on given data {xi }N i=1 N 1 1 min − wT w + γ vi e2 i w,e 2 2 i=1 subject to ei = wT ϕ(xi), i = 1, ..., N with weights vi (related to inverse degree matrix: V = D −1). Dual problem: Ωα = λDα with Ωij = K(xi, xj ) = ϕ(xi)T ϕ(xj ). • Kernel spectral clustering [Alzate & Suykens, IEEE-PAMI, 2010], related to spectral clustering [Fiedler, 1973; Shi & Malik, 2000; Ng et al. 2002; Chung, 1997] Models from data: a unifying picture - Johan Suykens 10
  • 23. Primal and dual model representations bias term b for centering k clusters, k − 1 sets of constraints (index l = 1, ..., k − 1) (l) (l) T (P ) : sign[ˆ∗ ] e = sign[w ϕ(x∗) + bl] ր M ց (l) (l) (D) : sign[ˆ∗ ] = sign[ e j αj K(x∗ , xj ) + bl ] Advantages: - out-of-sample extensions - model selection - solving large scale problems Models from data: a unifying picture - Johan Suykens 11
  • 24. Out-of-sample extension and coding 8 8 6 6 4 4 2 2 0 0 x(2) x(2) −2 −2 −4 −4 −6 −6 −8 −8 −10 −10 −12 −10 −8 −6 −4 −2 0 2 4 6 −12 −10 −8 −6 −4 −2 0 2 4 6 x(1) x(1) Models from data: a unifying picture - Johan Suykens 12
  • 25. Out-of-sample extension and coding 8 8 6 6 4 4 2 2 0 0 x(2) x(2) −2 −2 −4 −4 −6 −6 −8 −8 −10 −10 −12 −10 −8 −6 −4 −2 0 2 4 6 −12 −10 −8 −6 −4 −2 0 2 4 6 x(1) x(1) Models from data: a unifying picture - Johan Suykens 12
  • 26. Example: image segmentation 4 3 2 i,val (3) 1 e 0 −1 −2 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 (2) −3 −4 (1) e −5 e i,val i,val Models from data: a unifying picture - Johan Suykens 13
  • 27. Example: image segmentation 1 0.9 0.8 Fisher criterion 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2 4 6 8 10 12 14 Number of clusters k 0.1 0.05 0 αi,val −0.05 (2) −0.1 −0.15 −0.2 −0.25 −0.04 −0.02 0 (1)0.04 0.02 0.06 0.08 0.1 αi,val Models from data: a unifying picture - Johan Suykens 14
  • 28. Kernel spectral clustering: sparse kernel models original image binary clustering Incomplete Cholesky decomposition: Ω − GGT 2 ≤ η with G ∈ RN ×R and R ≪ N Image (Berkeley image dataset): 321 × 481 (154, 401 pixels), 175 SV (l) (l) e∗ = i∈SSV αi K(xi , x∗ ) + bl Models from data: a unifying picture - Johan Suykens 15
  • 29. Kernel spectral clustering: sparse kernel models original image sparse kernel model Incomplete Cholesky decomposition: Ω − GGT 2 ≤ η with G ∈ RN ×R and R ≪ N Image (Berkeley image dataset): 321 × 481 (154, 401 pixels), 175 SV (l) (l) e∗ = i∈SSV αi K(xi , x∗ ) + bl Models from data: a unifying picture - Johan Suykens 15
  • 30. Highly sparse kernel models: image segmentation * * * * * * * * *** * * * * * ** * ** * * ** * * * * * * * * * * * * * ** * * * * * * * * * * * * * ** * * * ** * * * * * * ** * * * * * * * * * * * * ** * * ** ** * * * * * * * * * * *** ** * * * * * * * * * * * * * ** ** * * *** * ** * * * * * * * ** * * * * * * ** * * * * * * ** ** * * * * * * * * * * ** * * * ** * * * * * * * * * * * * * ** * * * * * * * * * * * ** * * * * * ** ** * * * * * * * * ** * * * *** * * * * * * * * * * ** * * *** * * * * * * ** * ** * * ** * * * ** * * * ** * * * * * * * * * * * * ** * * *** * * ** * * ** * * * * * ** * * * * ** * * * * ** * * * * ** * ** * * * * * * * * * * ** * ** * * * * * * * * * * ** * * * * ** * 0.5 * * ** * * *** * * * ** * * ** * * ** * ** * * * * * * * **** * * * * * * * ** ** * * ** * ** *** * * * * ** ** * * ** * 0 * * * * ** * ** * * * *** * * * * * * * ** * * ** * * * * * ** ** * ** * * * ** ** * * * * * * −0.5 * * ** * * * * * ** * * ** ** * ** * * * * * ** ** * * * ** * * * * * * * * * ** (3) * * * * * * ** * −1 * * * ** * * * * * ** * * * * * * * * * * * * ** ** * * ** * ei * * * * * * * −1.5 −2 −2.5 −3 2 1 0.5 0 0 −1 −0.5 −2 −1 (2) −3 −1.5 (1) ei ei Models from data: a unifying picture - Johan Suykens 16
  • 31. Highly sparse kernel models: image segmentation * * * * * * * 0.5 0 * −0.5 ** (3) −1 * ei * −1.5 −2 −2.5 −3 2 1 0.5 0 0 −1 −0.5 −2 −1 (2) −3 −1.5 (1) ei ei only 3k = 12 support vectors [Alzate & Suykens, Neurocomputing, 2011] Models from data: a unifying picture - Johan Suykens 16
  • 32. Kernel spectral clustering: adding prior knowledge • Pair of points x†, x‡: c = 1 must-link, c = −1 cannot-link • Primal problem [Alzate & Suykens, IJCNN 2009] k−1 k−1 1 (l) T (l) 1 (l)T min − w w + γle D −1e(l) w(l) ,e(l) ,bl 2 2 l=1 l=1 (1) (1) subject to e = ΦN ×nh w + b11N . . e(k−1) = ΦN ×nh w(k−1) + bk−11N T T w(1) ϕ(x†) = cw(1) ϕ(x‡) . . T T w(k−1) ϕ(x†) = cw(k−1) ϕ(x‡) • Dual problem: yields rank-one downdate of the kernel matrix Models from data: a unifying picture - Johan Suykens 17
  • 33. Kernel spectral clustering: example original image without constraints Models from data: a unifying picture - Johan Suykens 18
  • 34. Kernel spectral clustering: example original image with constraints Models from data: a unifying picture - Johan Suykens 19
  • 35. Hierarchical kernel spectral clustering Hierarchical kernel spectral clustering: - looking at different scales - use of model selection and validation data [Alzate & Suykens, Neural Networks, 2012] Models from data: a unifying picture - Johan Suykens 20
  • 36. Power grid: kernel spectral clustering of time-series 1 1 1 0.9 0.9 0.9 0.8 0.8 0.8 normalized load normalized load normalized load 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0 5 10 15 20 5 10 15 20 5 10 15 20 hour hour hour Electricity load: 245 substations in Belgian grid (1/2 train, 1/2 validation) xi ∈ R43.824: spectral clustering on high dimensional data (5 years) 3 of 7 detected clusters: - 1: Residential profile: morning and evening peaks - 2: Business profile: peaked around noon - 3: Industrial profile: increasing morning, oscillating afternoon and evening [Alzate, Espinoza, De Moor, Suykens, 2009] Models from data: a unifying picture - Johan Suykens 21
  • 37. Dimensionality reduction and data visualization • Traditionally: commonly used techniques are e.g. principal component analysis (PCA), multi-dimensional scaling (MDS), self-organizing maps (SOM) • More recently: isomap, locally linear embedding (LLE), Hessian locally linear embedding, diffusion maps, Laplacian eigenmaps (“kernel eigenmap methods and manifold learning”) [Roweis & Saul, 2000; Coifman et al., 2005; Belkin et al., 2006] • Kernel maps with reference point [Suykens, IEEE-TNN 2008]: data visualization and dimensionality reduction by solving linear system Models from data: a unifying picture - Johan Suykens 22
  • 38. Kernel maps with reference point: formulation • Kernel maps with reference point [Suykens, IEEE-TNN 2008]: - LS-SVM core part: realize dimensionality reduction x → z - Regularization term: (z − PD z)T (z − PD z) = N zi − N sij Dzj 2 P P i=1 j=1 2 2 2 with D diagonal matrix and sij = exp(− xi − xj 2/σ ) - reference point q (e.g. first point; sacrificed in the visualization) • Example: d = 2 N 1 ν η X 2 min T T T (z − PD z) (z − PD z)+ (w1 w1 + w2 w2 ) + (ei,1 + e2 ) i,2 z,w1 ,w2 ,b1 ,b2 ,ei,1 ,ei,2 2 2 2 i=1 such that cT z = q1 + e1,1 1,1 T c1,2z = q2 + e1,2 cT z = w1 ϕ1 (xi ) + b1 + ei,1 , ∀i = 2, ..., N i,1 T cT z = w2 ϕ2 (xi ) + b2 + ei,2 , ∀i = 2, ..., N i,2 T Coordinates in low dimensional space: z = [z1; z2; ...; zN ] ∈ RdN Models from data: a unifying picture - Johan Suykens 23
  • 39. Kernel maps with reference point: formulation • Kernel maps with reference point [Suykens, IEEE-TNN 2008]: - LS-SVM core part: realize dimensionality reduction x → z T PN PN 2 - Regularization term: (z − PD z) (z − PD z) = i=1 zi − j=1 sij Dzj 2 with D diagonal matrix and sij = exp(− xi − xj 2/σ 2)2 - reference point q (e.g. first point; sacrificed in the visualization) • Example: d = 2 N 1 ν η X 2 min T T T (z − PD z) (z − PD z)+ (w1 w1 + w2 w2 ) + (ei,1 + e2 ) i,2 z,w1 ,w2 ,b1 ,b2 ,ei,1 ,ei,2 2 2 2 i=1 such that cT z = q1 + e1,1 1,1 T c1,2z = q2 + e1,2 cT z = w1 ϕ1 (xi ) + b1 + ei,1 , ∀i = 2, ..., N i,1 T cT z = w2 ϕ2 (xi ) + b2 + ei,2 , ∀i = 2, ..., N i,2 T Coordinates in low dimensional space: z = [z1; z2; ...; zN ] ∈ RdN Models from data: a unifying picture - Johan Suykens 23
  • 40. Kernel maps with reference point: formulation • Kernel maps with reference point [Suykens, IEEE-TNN 2008]: - LS-SVM core part: realize dimensionality reduction x → z - Regularization term: (z − PD z)T (z − PD z) = N zi − N sij Dzj 2 P P i=1 j=1 2 2 2 with D diagonal matrix and sij = exp(− xi − xj 2/σ ) - reference point q (e.g. first point; sacrificed in the visualization) • Example: d = 2 N 1 ν η X 2 min T T T (z − PD z) (z − PD z)+ (w1 w1 + w2 w2 ) + (ei,1 + e2 ) i,2 z,w1 ,w2 ,b1 ,b2 ,ei,1 ,ei,2 2 2 2 i=1 such that cT z = q1 + e1,1 1,1 cT z = q2 + e1,2 1,2 cT z = w1 ϕ1 (xi ) + b1 + ei,1 , ∀i = 2, ..., N i,1 T cT z = w2 ϕ2 (xi ) + b2 + ei,2 , ∀i = 2, ..., N i,2 T Coordinates in low dimensional space: z = [z1; z2; ...; zN ] ∈ RdN Models from data: a unifying picture - Johan Suykens 23
  • 41. Kernel maps: spiral example 0.5 0 x3 q = [+1; −1] q = [−1; −1] −3 −3 x 10 x 10 12 12 −0.5 1 10 10 0.5 1 0 0.5 −0.5 0 8 8 −0.5 −1 −1 −1.5 −1.5 x 6 6 2 x1 z hat z hat 2 2 4 4 2 2 0 0 −2 −2 −0.02 −0.015 −0.01 −0.005 0 0.005 0.01 −0.01 −0.005 0 0.005 0.01 0.015 0.02 z1hat z1hat training data (blue *), validation data (magenta o), test data (red +) 2 ˆT ˆ zi zj xT xj i Model selection: min ˆ ˆ zi 2 zj 2 − xi 2 xj 2 i,j Models from data: a unifying picture - Johan Suykens 24
  • 42. Kernel maps: visualizing gene distribution −3 x 10 2.1 2 1.9 3 z 1.8 1.7 2.3 2.2 −2.35 −2.3 2.1 −2.25 −2.2 −3 −2.15 −2.1 2 x 10 −2.05 −2 −1.95 1.9 −3 x 10 z2 z1 Alon colon cancer microarray data set: 3D projections Dimension input space: 62 Number of genes: 1500 (training: 500, validation: 500, test: 500) Models from data: a unifying picture - Johan Suykens 25
  • 43. Kernels & Tensors neuroscience: EEG data (time samples × frequency × electrodes) computer vision: image (/video) compression/completion/· · · (pixel × illumination × expression × · · ·) web mining: analyze users behaviors (users × queries × webpages) vector x matrix X tensor X - Naive kernel: K(X , Y) = exp(− 1 σ 2 vec(X ) − vec(Y) 2) 2 2 - Tensorial kernel exploiting structure: learning from few examples [Signoretto et al., Neural Networks, 2011 & IEEE-TSP, 2012] Models from data: a unifying picture - Johan Suykens 26
  • 44. Tensor completion Mass spectral imaging: sagittal section mouse brain [data: E. Waelkens, R. Van de Plas] Tensor completion using nuclear norm regularization [Signoretto et al., IEEE-SPL, 2011] Models from data: a unifying picture - Johan Suykens 27
  • 45. Overview • Models from data • Examples • Challenges for computational intelligence
  • 46. Challenges for Computational Intelligence - Bridging gaps between advanced methods and end-users - New mathematical and methodological frameworks - Scalable algorithms towards large and high-dimensional data Models from data: a unifying picture - Johan Suykens 28
  • 47. Acknowledgements • Colleagues at ESAT-SCD (especially research units: systems, models, control - biomedical data processing - bioinformatics): C. Alzate, A. Argyriou, J. De Brabanter, K. De Brabanter, B. De Moor, M. Espinoza, T. Falck, D. Geebelen, X. Huang, V. Jumutc, P. Karsmakers, R. Langone, J. Lopez, J. Luts, R. Mall, S. Mehrkanoon, Y. Moreau, K. Pelckmans, J. Puertas, L. Shi, M. Signoretto, V. Van Belle, R. Van de Plas, S. Van Huffel, J. Vandewalle, C. Varon, S. Yu, and others • Many people for joint work, discussions, invitations, organizations • Support from ERC AdG A-DATADRIVE-B, KU Leuven, GOA-MaNet, COE Optimization in Engineering OPTEC, IUAP DYSCO, FWO projects, IWT, IBBT eHealth, COST Models from data: a unifying picture - Johan Suykens 29