SlideShare uma empresa Scribd logo
1 de 7
BALANCING BOARD MACHINES
                                                      Frederic Maire,
                                School of Software Engineering and Data Communication,
                            Queensland University of Technology, Box 2434, Brisbane, Qld 4001,
                                                         Australia
                                                    f.maire@qut.edu.au



Abstract                                                         such that k ( x, y ) =   φ ( x),φ ( y ) . With such a kernel
The support vector machine solution corresponds to the           function      k , the computation of inner products
center of the largest sphere inscribed in version space.           φ ( x), φ ( y ) does not require the explicit knowledge of
Alternative approaches like Bayesian Point Machine
(BPM) and Analytic Center Machine have suggested that            φ . In fact for a given kernel function k , there may exist
the generalization performance can be further enhanced           many different mappings φ . Geometrically, each
                                                                 training example ( xi , y i ) defines a half-space in feature
by considering other possible centers of version space like
the centroid (center of mass). We present an algorithm to
compute exactly the centroid of higher dimensional               space through the constraint y i w, φ (x i ) > 0 on w .
polyhedra, then derive approximation algorithms to build
                                                                 It is easy to see that version space is a polyhedral cone of
a new learning machine whose performance is
                                                                 feature space.
comparable to BPM.
                                                                 Figure 1 shows a bird-eye view (slice of the polyhedral
Key Words                                                        cone) of version space.
Kernel machines, Bayesian Point, Centroid.

1. Introduction
Kernel classifiers are non-linear decision functions for
binary classification. In the Kernel Machine framework
(Muller & Mika & Ratch & Tsuda & Scholkopf, [1];
Scholkopft & Smola, [2]), a feature mapping x  φ (x)
from an input space to a feature space is given (generally,
implicitly via a kernel function), as well as a training set
                   {           }
of input vectors x 1 ,  , x m with the corresponding
class labels { y1 ,  , y m } where y i ∈ { − 1,+1} . The
learning problem is formulated as a search problem for a
linear classifier hypothesis (a weight vector w ) belonging      Figure 1: An elongated version space. The SVM point is
to a subset of the feature space called version space;           the centre of the sphere.
{ w | ∀i ∈ [1, m], y w, φ (x i ) > 0} . In other words,
                       i
version space is the set of weight vectors w that are            The SVM solution point wSVM is the centre of the largest
consistent with the training set.         Because only the       sphere whose centre is a unit vector and is contained in
direction of w matters for classification purpose, without       the polyhedral cone.
loss of generality, we can restrict the search for w to the
unit sphere in feature space. The training algorithm of a        Bayes Point Machines (BPM) are a well-founded
Support Vector Machine (SVM) returns the weight vector           improvement which approximates the Bayes-optimal
 w that has the smallest maximum angle between w and             decision by the centroid (also known as the centre of mass
the y iφ ( xi ) ’s. The Kernel trick is that for certain         or barycentre) of version space. It happens that the Bayes
                                                                 point is very close to the centroid of version space in high
feature spaces and mappings φ , there exist easily               dimensional spaces. The Bayes point achieves better
computable kernel functions k defined on the input space         generalization performance in comparison to SVMs
(Opper & Haussler, [3]; Shawe-Taylor & Williamson, [4];          dimensional v o l u m e V ( n, A, b) o f a p o l y h e d r o n
                                                                 P = { x | Ax ≤ b} is related to the (n − 1) -dimensional
Graepel & Herbrich & Campbell, [5]; Watkin, [6]).

An intuitive way to see why the centroid is a good choice        volumes of its facets and the row vectors of its matrix A
is to view version space as a committee of experts who all       by the following formula;
agree on the training set. A new input vector x                      V (n, A, b) = (1 / n)∑ (bi / ai ) × Vi (n − 1, A, b)
corresponds to a hyperplane in feature space that may cut                                       i
version space in two parts. In the example of Figure 1,
                                                                 where       ai d e n o t e s t h e i t h row of A and
the experts on the right of the hyperplane normal to φ ( x )
classify x positively, whereas the experts on the left           Vi ( n − 1, A, b) denotes (n − 1) -dimensional volume of
classify x negatively. It is reasonable to use the opinion       the ith facet. T h e c o m p u t a t i o n o f t h e c e n t r o i d
of the majority of the experts that successfully classified      a n d t h e ( n − 1) -volume of a facet is done by variable
the training set to predict the class label of x . The expert    elimination. Geometrically, this amounts to projecting
that agrees the most with the majority vote on new inputs        the facet onto an axis parallel hyperplane, then computing
is precisely the Bayesian point. In a standard committee         the volume and the c e n t r o i d of this projection
machine, for each new input we seek the opinions of a            recursively in a lower dimensional space. From the
finite number of experts’ then take a majority vote,             volume and c e n t r o i d of the projected facet, we can
whereas in a BPM, the expert that most often agrees with         derive the c e n t r o i d a n d v o l u m e of the original facet.
the majority vote of the infinite committee (version space)
is delegated the task of classifying the new input.              The formulae below are obtained by considering the n -
                                                                 fold integral defining the n -dimensional volume and
Following Rujan [7], Herbrich and Graepel [8] introduced         decomposing the polyhedron into cones. The centroid of
two algorithms to stochastically approximate the centroid        a polyhedron can be computed recursively in the
of version space: a billiard sampling algorithm and a            following manner;
sampling algorithm based on the well known perceptron
algorithm.                                                       •    Compute recursively the centroids GFi                     and the
In this paper, we present an algorithm to compute exactly             (n − 1) -volumes VFi                of each facet (face of
the centroid of a polyhedron in a high dimensional space.
                                                                      dimension n − 1 ) Fi of P . Each facet Fi
From this exact algorithm, we derive an algorithm to
approximate a centroid position in a polyhedral cone. We              corresponds to the intersection of P with the
show empirically that the corresponding machine presents              hyperplane defined by the i th row of the system
better generalization capability than SVMs on a number a               Ax ≤ b .
benchmark data sets.
                                                                                                    VFi
In section 2, we introduce an algorithm to compute                                GE = ∑                       × GFi
exactly the centroid of higher dimensional polyhedra. In
                                                                 •    Compute               i       ∑V
                                                                                                    j
                                                                                                          Fj
                                                                                                                       , the centroid of
section 3, we show how to use this algorithm to
approximate the centroid of version space. In section 4,              the envelope of P (the union of the facets Fi ).
some implementation issues are considered and some
experimental results are presented.                              •    Compute the centroids GCi and the n -volumes VCi

2. Exact Computation of the Centroid of a                             of the cones Ci = cone(GE , Fi ) rooted at GE . If
Higher Dimensional Polyhedron                                         hi is the distance from GE to the hyperplane
                                                                                                    h
A polyhedron P is the intersection of a finite number of              containing     Fi , then VCi = i × VFi   and
half-spaces. It is best represented by a system of non                                               n
redundant linear inequalities P = { x | Ax ≤ b} . Recall                           n
                                                                      GE GCi =         × GE GFi .
that the 1-volume is the length, the 2-volume is the                             n +1
surface and the 3-volume is the every-day-life volume.
The algorithm               that        we introduce for
c o m p u t i n g t h e c e n t r o i d o f a n n -dimensional
polyhedron i s a n e x t e n s i o n o f t h e w o r k b y
Lasserre [10] who showed that the n -
•    Compute G the centroid of P as the weighted sum
                             VCi
                   G=∑                  × GCi .
                         i   ∑V
                             j
                                   Cj



It is useful to observe that the computation of the volume
and the centroid of a ( n − 1) -dimensional polyhedron in
a n -dimensional space is identical to the computation of
the volume and the centroid of a facet of a n -dimensional
polyhedron. For further details, see the Matlab source
code at http://www.fit.qut.edu.au/~maire/G.

3 Balancing Board Machines

3.1 A Mechanical Point of View                               Figure 2: Top left, initial board. Top right, after one
                                                             iteration. Bottom left, after two iterations. Bottom right
                                                             after three iterations.
The point of contact of a board posed in equilibrium
on a sphere (assumed to be the only source of gravity)
is the centroid of the board. This observation is the        3.2 Exploring the Polyhedral Cone
basis of our “balancing board algorithm”. In the rest of
this paper, the term “board” will refer to the
intersection of the polyhedral cone of version space         Statistical learning theory (Scholkopft & Smola, [2])
with a hyperplane normal to a unit vector w of               tells us that the Bayes point w belongs to the vector
version space. This definition implies that if the           subspace V generated by the family of vectors
polyhedral cone is n -dimensional then a board will be
a ( n − 1) -dimensional polyhedron tangent to the unit
                                                             {φ ( x ),, φ ( x )} ,
                                                                  1           m
                                                                                      that is   w is of the form
sphere.                                                      w = ∑α φ ( x ) .
                                                                          j
                                                                              j

                                                                      j
In the algorithm we propose, the approximation w of          Once we know an orthonormal basis of V (the
the centroid direction of the cone is refined by
                                                             orthonormality is with respect to the inner product in
computing the centroid of the board normal to w , and
                                                             feature space corresponding to the kernel function in the
then rotating w towards the centroid of the board
                                                             input space), we can express the polyhedral cone
(stopping at a local minimum of the volume of the
                                                             inequalities with respect to this orthonormal basis. Then
board in this line search).
                                                             we can apply the formulae of section 2 to compute the
                                                             centroid of any polyhedron expressed in this orthonormal
Figure 2 illustrates the balancing process of a board.       basis. The kernel PCA basis is an orthonormal basis B
Notice that Figure 2 is simply an illustration as in         of V . Its basis vectors are the eigenvectors of the
dimension 2 the line search would succeed in just one
line-search iteration!
                                                                                      {(
                                                             symmetric matrix K = k xi , x j    )}   i, j   .
                                                             By expressing the polyhedral cone defined by the training
                                                             examples in B , we will be able to approximate a centroid
                                                             direction with the board balancing algorithm sketched in
                                                             section 3.1 and detailed below.

                                                             The complexity of the algorithm of section 2 to compute
                                                             exactly the centroid is unfortunately exponential. The
                                                             computational cost of the exact calculation of the centroid
                                                             is too high even for medium size data sets. However, the
                                                             recursive formulae allow us to derive an approximation of
                                                             the volume and the centroid of a polyhedron once we
                                                             have approximations for the volumes and the centroids of
                                                             its facets.
polyhedral cone and whose centre is at distance one from
Because the balancing board algorithm requires several           zero) corresponds to the SVM solution. Because A is
board centroid estimations, it is desirable to recycle           square and non-singular, each facet of the polyhedral cone
intermediate results as much as possible to achieve a            touches the largest sphere. If each facet is moved by a
significant reduction in computation time. Because the           distance of one in the direction of its normal vector, the
intersection of a hyperplane and a spheric cone is an            new cone obtained is a translation of the original cone in
ellipsoid, we estimate the volume and the centroid of the        the direction of ws . That is ws can be obtained by
intersection of the board and a facet of the polyhedral                                 
cone (this intersection is (n-2)-dimensional) with the           solving Ax = − 1 .
volume and the centroid of the intersection of the board
and the largest spheric cone contained in the facet (this        Once the direction u = ws of the spheric centre of the
spheric cone is (n-1)-dimensional). The computation of
these largest spheric cones is done only once. The centre        polyhedral cone is determined, the radius r of the largest
of the ellipsoid and its quadratic matrix is easily derived      sphere centered at u can be computed. Here the radius
from the centre and radius of the spheric cone. These            of the spheric cone is defined as the radius of the largest
derivations are explained in the next sub-sections.              (n-1)-sphere contained in the intersection of the cone and
                                                                 the hyperplane u T x = 1 . We use this definition to avoid
To simplify the computations, we have restricted our             geodesics.
study to non-singular kernel matrices (like those obtained
from Gaussian kernels).                                          3.2.3 Computation of r
3.2.1 Change of Basis                                            We write A(k , :) to denote the kth row of matrix A . For

Let wB be the coordinates of w with respect to
                                                                 each i , letα i = π / 2 − acos( − A( i,:) u ) . The scalar r
                                                                 is the minimum over all tan (α i ) . If we are interested in
B = {φ ( x 1 ), , φ ( x m )} . Recall that the Kernel PCA
                                                                 the attributes of the cone contained in the facet
basis is made of the eigenvectors of K . Let wU be the            A( k ,:) x = 0 , we simply solve the system
coordinates of w with respect to the Kernel PCA
                                                                                 A([1 : k −1, k +1 :],:) x = −1
orthonormal basis {u 1 ,  , u m } . We have wB = UwU .                         
       def                                                                       A( k , :) x = 0
Let M = K + λI , where λ is a non-negative
regularization parameter as in (Herbrich et al, [9]). We are     3.2.4 Spheric Cone Equation
looking for wB such that − diag( y ) MwB ≤ 0 with
                                                                 Given the characteristic attributes u and r of a spheric
 w, w = 1 and w near the centroid direction of the               cone, we can derive a simple equation for the cone.
polyhedral cone.    As we have w, w = wU     ( )   T
                                                       wU , in            (         )
                                                                 Let z = u T x u and y = x − z . The cone equation is
practice, we look for the centroid direction of the cone         y y = r z z . An alternative equation (derived from
                                                                   T          2 T



             − diag( y ) MUwU ≤ 0                                                                (        )(
                                                                 Pythagoras theorem) is x T x = 1 + r 2 u T x           )   2
                                                                                                                                .
             
                 ( )   T
                   wU wU = 1
                                                           (2)
                                                                 Our estimation of the volume and centroid of the board
                                                                 requires the estimation of the volume and centroid of the
3.2.2 Computing the Spheric Centre of a                          intersection of a cone and two hyperplanes (namely a
Polyhedral Cone Derived from a Non                               facet and the hyperplane containing the board).
Singular Mercer Kernel Matrix
                                                                 3.2.5 Intersection of a Spheric Cone and Two
Let Ax ≤ 0 be the non-empty polyhedral cone derived              Hyperplanes
from the kernel matrix. The matrix A is square ( m = n
). Without loss of generality, we assume that its rows are                                    (       )(
                                                                 Consider the cone x T x = 1 + r 2 u T x        )   2
                                                                                                                        contained in
                                                                 the kth facet (that is A( k ,:) u = 0 ). Let’s compute the
normalized. That is each row is a vector of norm 1.
Recall that the spheric centre ws of the cone (direction of
the centre of the largest sphere contained in the
ellipsoid defined by the intersection of this cone and the                                                             1                            
hyperplane wT x = 1 ( w is normal to the board).                                                                       δ         0             0 
                                 wT                                                                                1                            
Let Q = [ q1 ,, qn −2 ] = null                                                                                      0 
                                                                                                                                                 
                                A( k ,:)   . Let h be the                                                     M =
                                                                                                                                           0 
                                                                                                                                                    
intersection of the ray defined by u and the hyperplane                                                                                          1 
                                                                                                                       0  0
 wT x = 1 . Let us make the change of variables                                                                                                 δn 
                                                                                                                                                    
 x = h+Q z .          One can easily check that
                                                                                                  Recall that if f  ( x ) = Mx is a linear transformation and
∀z ∈ R , w ( h + Q z ) = 1 and A( k ,:) ( h + Q z ) = 0
         n−2    T
                                                                                                  S is a subset of the vector space, then we have,
                                                                                                  vol( f ( S ) ) = abs( det ( M ) ) × vol( S ) .
We derive now the equation of the ellipsoid with respect
to z . From x T x = 1 + r 2 u T x   (                )(     )   2
                                                                    , we obtain                                                                     n
                                                                                                                                                       1
                                                                                                  The volume of the ellipsoid is therefore ∏
( h + Q z ) T ( h + Q z ) = (1 + r 2 )(u T h + u T Q z ) 2                                                                                        i =1 δi
                                                                                                                                                          times

After           developing                   the          expression,              we      get    the n-volume of the n-sphere.
h h + 2h Q z + z Q Q z = (1 + r                                           )(u h)   2
  T                 T               T       T                         2     T
                                                                                       +          For completeness, let us mention that the volume of a n-

      (1 + r )( z           Q T uu T Q z ) + (1 + r 2 )( 2u T hu T Q z )
                                                                                                                                     1
                                                                                                                                         n
                2       T                                                                                               2r n π 2
                                                                                                  sphere of radius r is     ×       , and the volume of
After regrouping, we have                                                                                                n Γ( 1 n )
                                                                                                                              2
z T Q T ( I n − (1 + r 2 ) uu T )Q z +                                                                                                                 n
                                                                                                                                                             1
                                                                                                  the n-rectangle containing the ellipsoid is 2 × ∏
                                                                                                                                               n

      2( h − (1 + r
            T                 2
                                )(u hu ) ) Q z +
                                        T        T
                                                                                                                                                      i =1   δi
      h T h − (1 + r 2         )(u h ) = 0
                                        T    2                                                    .

From this expression we can derive an expression of the                                           3.2.7 Distance from w to a Facet
form ( z − c ) D ( z − c ) = b that will tell us the (n-2)-
                        T

volume of the ellipsoid and its centre. It is easy to check                                       To compute the ( n − 1) -volume of the intersection of the
that h + Q c is the centre of the ellipsoid in R n .                                              board w T x = 1 and the polyhedral cone P, we need to
                                                                                                  find for each facet A( k , :) x ≤ 0 the point x in the plane
In the next subsection, we show how to compute the
                                                                                                  generated by w and A( k , :)
                                                                                                                                   T
volume of the ellipsoid.                                                                                                               that belongs to this
                                                                                                  intersection. That is the orthogonal projection of w on
3.2.6       Volume                                   of             an          Ellipsoid         the hinge defined as the intersection of the board and the
( x − c ) D( x − c ) = b
         T
                                                                                                  kth cone facet. The point x = α A( k , :) + β w must
                                                                                                                                                  T

                                                                                                  satisfy
Without             loss       of           generality,             we      assume         that
                                                                                                                     wT x = 1
c = 0 and b = 1 . The matrix D is symmetric non-                                                                                     .
negative,  therefore there exists a decomposition                                                                    A( k , :) x = 0
D = P∆P where P is orthogonal and ∆ non-negative
                    T
                                                                                                            γα + β = 1
and diagonal.                                                                                     Therefore            where γ = A( k , :) w .
Let y = P T x , the equation of the ellipsoid becomes                                                       α + γβ = 0
                                                                                                  That is,
∑δ      i   yi2 = 1 . Let z = δ y . We can check that if
                                                                                                                                   γ 1  1
                                                                                                                                             −1

                                                                                                              x = [ A( k , :) w] × 
                           i   i i                                                                                         T
 i
 y is on the ellipsoid then z is on the unit sphere and                                                                                    .
                                                                                                                                    1 γ  0
reciprocally. That is the ellipsoid is obtained from the
unit sphere by the linear transformation of matrix M ,
                                                                                                  4. Implementation Issues and Experimental
where
                                                                                                  Results
We have implemented the exact computation of the               The exact computation algorithm can be useful for
centroid and the volume in Matlab. A direct recursive          benchmarking to people developing new centroid
implementation of Lasserre formula would be very               approximation algorithms. We do not claim that our
inefficient as faces of dimension k share faces of             BBM approach is superior to any other given that the
dimension k − 1 . Our implementation caches the                computational cost is in the order of m times the cost of
volumes and centroids of the lower dimensional faces in a      a SVM computation (where m is the number of training
hash-table.                                                    examples).

Our algorithm has been validated by comparing the values       Replacing the ellipsoids with a more accurate estimation
returned with a Monte-Carlo method.                            would probably give better results, but deriving the
                                                               volume and the centroid of the intersection of a facet and
As Lasserre’s formula is valid only if the polyhedron is       a board from the volume and the centroid of the
represented as a system of non-redundant linear                intersection of the same facet with another board seems to
inequalities.    Redundancy must be detected and               be a hard problem.
eliminated by using a linear optimization.
                                                               The computation of the SVM point presented in section
The kernel matrix of a Gaussian kernel can only be             3.2.2 provides an efficient learning algorithm for
singular when identical input vectors occur more than          Gaussian kernels.
once the training set. We remove repeated occurrences of
the same input vector and assign the most common label             4. Acknowledgement
for this input vector to the occurrence that we leave in the
training set.                                                  I would like to thank Professor Tom Downs and Professor
                                                               Peter Bartlett for their valuable comments on a previous
The table which follows summarises generalization              version of the BBM algorithm. This work was partially
performance (percentage of correct predictions on test         supported by an ATN grant.
sets) of the Balancing Board Machine (BBM) on 6
standard benchmarking data sets from the UCI                   References
Repository, comparing results for illustrative purposes
with equivalent hard margin support vector machines. In        [1] Muller, K., Mika, S., Ratch, G., Tsuda, K., and
each case the data was randomly partitioned into 20              Scholkopf, B.     An Introduction To Kernel-Based
training and test sets in the ratio 60%:40%.                     Learning Algorithms. IEEE Trans. on NN, vol 12, no 2,
                                                                 2001, pp 181-201.
 Data set           SVM                  BBM                   [2] Scholkopft, B., Smola, A., Learning with Kernels,
 heart disease      58.36                58.40                   http://www.kernel-machines.org/
 thyroid            94.34                95.23
                                                               [3] M. Opper and D. Haussler, Generalization
 diabetes           66.89                67.68                   performance of Bayes optimal classification algorithm
 waveform           83.50                83.50                   for learning a perceptron, Phys. Rev. Lett., vol. 66, p.
 sonar              85.06                85.78                   2677, 1991.
 ionosphere         86.79                86.86
                                                               [4] J. Shawe-Taylor and R. C. Williamson, A PAC
The results obtained with a BBM are comparable to those          analysis of a Bayesian estimator, Royal Holloway,
                                                                 Univ. London, Tech. Rep. NC2-TR-1997-013, 1997.
obtained with a BPM, but the improvement is not always
as dramatic as those reported in (Herbrich et al., [9]). We    [5] T. Graepel, R. Herbrich, and C. Campbell, Bayes
observed that the improvement was generally better for           point machines: Estimating the bayes point in kernel
smaller data sets. We suspect that this is due to the fact       space, in Proc.f IJCAI Workshop Support Vector
the volumes considered become very small in high                 Machines, 1999, pp. 23-27.
dimensional spaces. In fact, on a PC, unit spheres             [6] T. Watkin, Optimal learning with a neural network,
“vanish” when their dimension exceed 340. The volume             Europhys. Lett., vol. 21, pp. 871-877, 1993.
of a unit sphere of dimension 340 is 4.5 10 -223. This is
why we consider the logarithm of the volume in our             [7] P. Ruján, Playing billiard in version space, Neural
programs.                                                        Comput., vol. 9, pp. 197-238, 1996.
                                                               [8] R. Herbrich and T. Graepel, Large scale Bayes point
5. Conclusion                                                    machines, Advances in Neural Information System
                                                                 Processing 13, 2001.
[9] R. Herbrich, T. Graepel, and C. Bayes Point
  Machines, Journal of Machine Learning Research, 1
  (2001) 245--279.
[10] Lasserre, J., An analytical Expression and an
  Algorithm for the volume of a Convex Polyhedron in
  Rn, Journal of Optimization Theory and Applications,
  Vol 39, No 3, 1983.
Schrijver, A. Theory of Linear and Integer Programming,
 Wiley-Interscience Publication (1990).
Theodore B. Trafalis, Alexander M. Malyscheff: An
 Analytic Center Machine. 203-223, Machine Learning,
 Volume 46, 2002

Mais conteúdo relacionado

Mais procurados

11.quadrature radon transform for smoother tomographic reconstruction
11.quadrature radon transform for smoother  tomographic reconstruction11.quadrature radon transform for smoother  tomographic reconstruction
11.quadrature radon transform for smoother tomographic reconstruction
Alexander Decker
 
20110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-0220110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-02
Computer Science Club
 
20110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-0420110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-04
Computer Science Club
 
Chapter 5 Thomas 9 Ed F 2008
Chapter 5 Thomas 9 Ed F 2008Chapter 5 Thomas 9 Ed F 2008
Chapter 5 Thomas 9 Ed F 2008
angelovedamla
 
改进的固定点图像复原算法_英文_阎雪飞
改进的固定点图像复原算法_英文_阎雪飞改进的固定点图像复原算法_英文_阎雪飞
改进的固定点图像复原算法_英文_阎雪飞
alen yan
 
Simulated annealing for MMR-Path
Simulated annealing for MMR-PathSimulated annealing for MMR-Path
Simulated annealing for MMR-Path
Francisco Pérez
 

Mais procurados (20)

Mm2521542158
Mm2521542158Mm2521542158
Mm2521542158
 
14_autoencoders.pdf
14_autoencoders.pdf14_autoencoders.pdf
14_autoencoders.pdf
 
High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...
 
15_representation.pdf
15_representation.pdf15_representation.pdf
15_representation.pdf
 
GENERIC APPROACH FOR VISIBLE WATERMARKING
GENERIC APPROACH FOR VISIBLE WATERMARKINGGENERIC APPROACH FOR VISIBLE WATERMARKING
GENERIC APPROACH FOR VISIBLE WATERMARKING
 
Algorithm
AlgorithmAlgorithm
Algorithm
 
11.[23 36]quadrature radon transform for smoother tomographic reconstruction
11.[23 36]quadrature radon transform for smoother  tomographic reconstruction11.[23 36]quadrature radon transform for smoother  tomographic reconstruction
11.[23 36]quadrature radon transform for smoother tomographic reconstruction
 
11.quadrature radon transform for smoother tomographic reconstruction
11.quadrature radon transform for smoother  tomographic reconstruction11.quadrature radon transform for smoother  tomographic reconstruction
11.quadrature radon transform for smoother tomographic reconstruction
 
20110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-0220110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-02
 
Graphical Models In Python | Edureka
Graphical Models In Python | EdurekaGraphical Models In Python | Edureka
Graphical Models In Python | Edureka
 
Principal Sensitivity Analysis
Principal Sensitivity AnalysisPrincipal Sensitivity Analysis
Principal Sensitivity Analysis
 
Graphical Model Selection for Big Data
Graphical Model Selection for Big DataGraphical Model Selection for Big Data
Graphical Model Selection for Big Data
 
20110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-0420110319 parameterized algorithms_fomin_lecture03-04
20110319 parameterized algorithms_fomin_lecture03-04
 
Chapter 5 Thomas 9 Ed F 2008
Chapter 5 Thomas 9 Ed F 2008Chapter 5 Thomas 9 Ed F 2008
Chapter 5 Thomas 9 Ed F 2008
 
Lar calc10 ch01_sec4
Lar calc10 ch01_sec4Lar calc10 ch01_sec4
Lar calc10 ch01_sec4
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
改进的固定点图像复原算法_英文_阎雪飞
改进的固定点图像复原算法_英文_阎雪飞改进的固定点图像复原算法_英文_阎雪飞
改进的固定点图像复原算法_英文_阎雪飞
 
Fulltext
FulltextFulltext
Fulltext
 
Применение машинного обучения для навигации и управления роботами
Применение машинного обучения для навигации и управления роботамиПрименение машинного обучения для навигации и управления роботами
Применение машинного обучения для навигации и управления роботами
 
Simulated annealing for MMR-Path
Simulated annealing for MMR-PathSimulated annealing for MMR-Path
Simulated annealing for MMR-Path
 

Destaque

Progress on Semi-Supervised Machine Learning Techniques
Progress on Semi-Supervised Machine Learning TechniquesProgress on Semi-Supervised Machine Learning Techniques
Progress on Semi-Supervised Machine Learning Techniques
butest
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
butest
 
June 16, 2008
June 16, 2008June 16, 2008
June 16, 2008
butest
 
BINF 760 new course approved
BINF 760 new course approvedBINF 760 new course approved
BINF 760 new course approved
butest
 
ITRrefs.doc
ITRrefs.docITRrefs.doc
ITRrefs.doc
butest
 
LECTURE8.PPT
LECTURE8.PPTLECTURE8.PPT
LECTURE8.PPT
butest
 

Destaque (8)

doc
docdoc
doc
 
Progress on Semi-Supervised Machine Learning Techniques
Progress on Semi-Supervised Machine Learning TechniquesProgress on Semi-Supervised Machine Learning Techniques
Progress on Semi-Supervised Machine Learning Techniques
 
divani componibili
divani componibilidivani componibili
divani componibili
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
June 16, 2008
June 16, 2008June 16, 2008
June 16, 2008
 
BINF 760 new course approved
BINF 760 new course approvedBINF 760 new course approved
BINF 760 new course approved
 
ITRrefs.doc
ITRrefs.docITRrefs.doc
ITRrefs.doc
 
LECTURE8.PPT
LECTURE8.PPTLECTURE8.PPT
LECTURE8.PPT
 

Semelhante a BALANCING BOARD MACHINES

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
kylin
 
Multilayer Neural Networks
Multilayer Neural NetworksMultilayer Neural Networks
Multilayer Neural Networks
ESCOM
 
Solution of nonlinear_equations
Solution of nonlinear_equationsSolution of nonlinear_equations
Solution of nonlinear_equations
Tarun Gehlot
 
Global Optimization with Descending Region Algorithm
Global Optimization with Descending Region AlgorithmGlobal Optimization with Descending Region Algorithm
Global Optimization with Descending Region Algorithm
Loc Nguyen
 
proposal_pura
proposal_puraproposal_pura
proposal_pura
Erick Lin
 

Semelhante a BALANCING BOARD MACHINES (20)

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Newton cotes integration method
Newton cotes integration  methodNewton cotes integration  method
Newton cotes integration method
 
Presentation of volesti in eRum 2020
Presentation of volesti in eRum 2020 Presentation of volesti in eRum 2020
Presentation of volesti in eRum 2020
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
 
Multilayer Neural Networks
Multilayer Neural NetworksMultilayer Neural Networks
Multilayer Neural Networks
 
Hoip10 articulo surface reconstruction_upc
Hoip10 articulo surface reconstruction_upcHoip10 articulo surface reconstruction_upc
Hoip10 articulo surface reconstruction_upc
 
Single Variable Calculus Assignment Help
Single Variable Calculus Assignment HelpSingle Variable Calculus Assignment Help
Single Variable Calculus Assignment Help
 
Single Variable Calculus Assignment Help
Single Variable Calculus Assignment HelpSingle Variable Calculus Assignment Help
Single Variable Calculus Assignment Help
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Single Variable Calculus Assignment Help
Single Variable Calculus Assignment HelpSingle Variable Calculus Assignment Help
Single Variable Calculus Assignment Help
 
Integration
IntegrationIntegration
Integration
 
AI Lesson 29
AI Lesson 29AI Lesson 29
AI Lesson 29
 
Lesson 29
Lesson 29Lesson 29
Lesson 29
 
Solution of nonlinear_equations
Solution of nonlinear_equationsSolution of nonlinear_equations
Solution of nonlinear_equations
 
5 4 Notes
5 4 Notes5 4 Notes
5 4 Notes
 
L 4 4
L 4 4L 4 4
L 4 4
 
Quantum Deep Learning
Quantum Deep LearningQuantum Deep Learning
Quantum Deep Learning
 
Global Optimization with Descending Region Algorithm
Global Optimization with Descending Region AlgorithmGlobal Optimization with Descending Region Algorithm
Global Optimization with Descending Region Algorithm
 
Polyhedral computations in computational algebraic geometry and optimization
Polyhedral computations in computational algebraic geometry and optimizationPolyhedral computations in computational algebraic geometry and optimization
Polyhedral computations in computational algebraic geometry and optimization
 
proposal_pura
proposal_puraproposal_pura
proposal_pura
 

Mais de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

Mais de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

BALANCING BOARD MACHINES

  • 1. BALANCING BOARD MACHINES Frederic Maire, School of Software Engineering and Data Communication, Queensland University of Technology, Box 2434, Brisbane, Qld 4001, Australia f.maire@qut.edu.au Abstract such that k ( x, y ) = φ ( x),φ ( y ) . With such a kernel The support vector machine solution corresponds to the function k , the computation of inner products center of the largest sphere inscribed in version space. φ ( x), φ ( y ) does not require the explicit knowledge of Alternative approaches like Bayesian Point Machine (BPM) and Analytic Center Machine have suggested that φ . In fact for a given kernel function k , there may exist the generalization performance can be further enhanced many different mappings φ . Geometrically, each training example ( xi , y i ) defines a half-space in feature by considering other possible centers of version space like the centroid (center of mass). We present an algorithm to compute exactly the centroid of higher dimensional space through the constraint y i w, φ (x i ) > 0 on w . polyhedra, then derive approximation algorithms to build It is easy to see that version space is a polyhedral cone of a new learning machine whose performance is feature space. comparable to BPM. Figure 1 shows a bird-eye view (slice of the polyhedral Key Words cone) of version space. Kernel machines, Bayesian Point, Centroid. 1. Introduction Kernel classifiers are non-linear decision functions for binary classification. In the Kernel Machine framework (Muller & Mika & Ratch & Tsuda & Scholkopf, [1]; Scholkopft & Smola, [2]), a feature mapping x  φ (x) from an input space to a feature space is given (generally, implicitly via a kernel function), as well as a training set { } of input vectors x 1 ,  , x m with the corresponding class labels { y1 ,  , y m } where y i ∈ { − 1,+1} . The learning problem is formulated as a search problem for a linear classifier hypothesis (a weight vector w ) belonging Figure 1: An elongated version space. The SVM point is to a subset of the feature space called version space; the centre of the sphere. { w | ∀i ∈ [1, m], y w, φ (x i ) > 0} . In other words, i version space is the set of weight vectors w that are The SVM solution point wSVM is the centre of the largest consistent with the training set. Because only the sphere whose centre is a unit vector and is contained in direction of w matters for classification purpose, without the polyhedral cone. loss of generality, we can restrict the search for w to the unit sphere in feature space. The training algorithm of a Bayes Point Machines (BPM) are a well-founded Support Vector Machine (SVM) returns the weight vector improvement which approximates the Bayes-optimal w that has the smallest maximum angle between w and decision by the centroid (also known as the centre of mass the y iφ ( xi ) ’s. The Kernel trick is that for certain or barycentre) of version space. It happens that the Bayes point is very close to the centroid of version space in high feature spaces and mappings φ , there exist easily dimensional spaces. The Bayes point achieves better computable kernel functions k defined on the input space generalization performance in comparison to SVMs
  • 2. (Opper & Haussler, [3]; Shawe-Taylor & Williamson, [4]; dimensional v o l u m e V ( n, A, b) o f a p o l y h e d r o n P = { x | Ax ≤ b} is related to the (n − 1) -dimensional Graepel & Herbrich & Campbell, [5]; Watkin, [6]). An intuitive way to see why the centroid is a good choice volumes of its facets and the row vectors of its matrix A is to view version space as a committee of experts who all by the following formula; agree on the training set. A new input vector x V (n, A, b) = (1 / n)∑ (bi / ai ) × Vi (n − 1, A, b) corresponds to a hyperplane in feature space that may cut i version space in two parts. In the example of Figure 1, where ai d e n o t e s t h e i t h row of A and the experts on the right of the hyperplane normal to φ ( x ) classify x positively, whereas the experts on the left Vi ( n − 1, A, b) denotes (n − 1) -dimensional volume of classify x negatively. It is reasonable to use the opinion the ith facet. T h e c o m p u t a t i o n o f t h e c e n t r o i d of the majority of the experts that successfully classified a n d t h e ( n − 1) -volume of a facet is done by variable the training set to predict the class label of x . The expert elimination. Geometrically, this amounts to projecting that agrees the most with the majority vote on new inputs the facet onto an axis parallel hyperplane, then computing is precisely the Bayesian point. In a standard committee the volume and the c e n t r o i d of this projection machine, for each new input we seek the opinions of a recursively in a lower dimensional space. From the finite number of experts’ then take a majority vote, volume and c e n t r o i d of the projected facet, we can whereas in a BPM, the expert that most often agrees with derive the c e n t r o i d a n d v o l u m e of the original facet. the majority vote of the infinite committee (version space) is delegated the task of classifying the new input. The formulae below are obtained by considering the n - fold integral defining the n -dimensional volume and Following Rujan [7], Herbrich and Graepel [8] introduced decomposing the polyhedron into cones. The centroid of two algorithms to stochastically approximate the centroid a polyhedron can be computed recursively in the of version space: a billiard sampling algorithm and a following manner; sampling algorithm based on the well known perceptron algorithm. • Compute recursively the centroids GFi and the In this paper, we present an algorithm to compute exactly (n − 1) -volumes VFi of each facet (face of the centroid of a polyhedron in a high dimensional space. dimension n − 1 ) Fi of P . Each facet Fi From this exact algorithm, we derive an algorithm to approximate a centroid position in a polyhedral cone. We corresponds to the intersection of P with the show empirically that the corresponding machine presents hyperplane defined by the i th row of the system better generalization capability than SVMs on a number a Ax ≤ b . benchmark data sets. VFi In section 2, we introduce an algorithm to compute GE = ∑ × GFi exactly the centroid of higher dimensional polyhedra. In • Compute i ∑V j Fj , the centroid of section 3, we show how to use this algorithm to approximate the centroid of version space. In section 4, the envelope of P (the union of the facets Fi ). some implementation issues are considered and some experimental results are presented. • Compute the centroids GCi and the n -volumes VCi 2. Exact Computation of the Centroid of a of the cones Ci = cone(GE , Fi ) rooted at GE . If Higher Dimensional Polyhedron hi is the distance from GE to the hyperplane h A polyhedron P is the intersection of a finite number of containing Fi , then VCi = i × VFi and half-spaces. It is best represented by a system of non n redundant linear inequalities P = { x | Ax ≤ b} . Recall n GE GCi = × GE GFi . that the 1-volume is the length, the 2-volume is the n +1 surface and the 3-volume is the every-day-life volume. The algorithm that we introduce for c o m p u t i n g t h e c e n t r o i d o f a n n -dimensional polyhedron i s a n e x t e n s i o n o f t h e w o r k b y Lasserre [10] who showed that the n -
  • 3. Compute G the centroid of P as the weighted sum VCi G=∑ × GCi . i ∑V j Cj It is useful to observe that the computation of the volume and the centroid of a ( n − 1) -dimensional polyhedron in a n -dimensional space is identical to the computation of the volume and the centroid of a facet of a n -dimensional polyhedron. For further details, see the Matlab source code at http://www.fit.qut.edu.au/~maire/G. 3 Balancing Board Machines 3.1 A Mechanical Point of View Figure 2: Top left, initial board. Top right, after one iteration. Bottom left, after two iterations. Bottom right after three iterations. The point of contact of a board posed in equilibrium on a sphere (assumed to be the only source of gravity) is the centroid of the board. This observation is the 3.2 Exploring the Polyhedral Cone basis of our “balancing board algorithm”. In the rest of this paper, the term “board” will refer to the intersection of the polyhedral cone of version space Statistical learning theory (Scholkopft & Smola, [2]) with a hyperplane normal to a unit vector w of tells us that the Bayes point w belongs to the vector version space. This definition implies that if the subspace V generated by the family of vectors polyhedral cone is n -dimensional then a board will be a ( n − 1) -dimensional polyhedron tangent to the unit {φ ( x ),, φ ( x )} , 1 m that is w is of the form sphere. w = ∑α φ ( x ) . j j j In the algorithm we propose, the approximation w of Once we know an orthonormal basis of V (the the centroid direction of the cone is refined by orthonormality is with respect to the inner product in computing the centroid of the board normal to w , and feature space corresponding to the kernel function in the then rotating w towards the centroid of the board input space), we can express the polyhedral cone (stopping at a local minimum of the volume of the inequalities with respect to this orthonormal basis. Then board in this line search). we can apply the formulae of section 2 to compute the centroid of any polyhedron expressed in this orthonormal Figure 2 illustrates the balancing process of a board. basis. The kernel PCA basis is an orthonormal basis B Notice that Figure 2 is simply an illustration as in of V . Its basis vectors are the eigenvectors of the dimension 2 the line search would succeed in just one line-search iteration! {( symmetric matrix K = k xi , x j )} i, j . By expressing the polyhedral cone defined by the training examples in B , we will be able to approximate a centroid direction with the board balancing algorithm sketched in section 3.1 and detailed below. The complexity of the algorithm of section 2 to compute exactly the centroid is unfortunately exponential. The computational cost of the exact calculation of the centroid is too high even for medium size data sets. However, the recursive formulae allow us to derive an approximation of the volume and the centroid of a polyhedron once we have approximations for the volumes and the centroids of its facets.
  • 4. polyhedral cone and whose centre is at distance one from Because the balancing board algorithm requires several zero) corresponds to the SVM solution. Because A is board centroid estimations, it is desirable to recycle square and non-singular, each facet of the polyhedral cone intermediate results as much as possible to achieve a touches the largest sphere. If each facet is moved by a significant reduction in computation time. Because the distance of one in the direction of its normal vector, the intersection of a hyperplane and a spheric cone is an new cone obtained is a translation of the original cone in ellipsoid, we estimate the volume and the centroid of the the direction of ws . That is ws can be obtained by intersection of the board and a facet of the polyhedral  cone (this intersection is (n-2)-dimensional) with the solving Ax = − 1 . volume and the centroid of the intersection of the board and the largest spheric cone contained in the facet (this Once the direction u = ws of the spheric centre of the spheric cone is (n-1)-dimensional). The computation of these largest spheric cones is done only once. The centre polyhedral cone is determined, the radius r of the largest of the ellipsoid and its quadratic matrix is easily derived sphere centered at u can be computed. Here the radius from the centre and radius of the spheric cone. These of the spheric cone is defined as the radius of the largest derivations are explained in the next sub-sections. (n-1)-sphere contained in the intersection of the cone and the hyperplane u T x = 1 . We use this definition to avoid To simplify the computations, we have restricted our geodesics. study to non-singular kernel matrices (like those obtained from Gaussian kernels). 3.2.3 Computation of r 3.2.1 Change of Basis We write A(k , :) to denote the kth row of matrix A . For Let wB be the coordinates of w with respect to each i , letα i = π / 2 − acos( − A( i,:) u ) . The scalar r is the minimum over all tan (α i ) . If we are interested in B = {φ ( x 1 ), , φ ( x m )} . Recall that the Kernel PCA the attributes of the cone contained in the facet basis is made of the eigenvectors of K . Let wU be the A( k ,:) x = 0 , we simply solve the system coordinates of w with respect to the Kernel PCA  A([1 : k −1, k +1 :],:) x = −1 orthonormal basis {u 1 ,  , u m } . We have wB = UwU .  def  A( k , :) x = 0 Let M = K + λI , where λ is a non-negative regularization parameter as in (Herbrich et al, [9]). We are 3.2.4 Spheric Cone Equation looking for wB such that − diag( y ) MwB ≤ 0 with Given the characteristic attributes u and r of a spheric w, w = 1 and w near the centroid direction of the cone, we can derive a simple equation for the cone. polyhedral cone. As we have w, w = wU ( ) T wU , in ( ) Let z = u T x u and y = x − z . The cone equation is practice, we look for the centroid direction of the cone y y = r z z . An alternative equation (derived from T 2 T − diag( y ) MUwU ≤ 0 ( )( Pythagoras theorem) is x T x = 1 + r 2 u T x ) 2 .   ( ) T wU wU = 1 (2) Our estimation of the volume and centroid of the board requires the estimation of the volume and centroid of the 3.2.2 Computing the Spheric Centre of a intersection of a cone and two hyperplanes (namely a Polyhedral Cone Derived from a Non facet and the hyperplane containing the board). Singular Mercer Kernel Matrix 3.2.5 Intersection of a Spheric Cone and Two Let Ax ≤ 0 be the non-empty polyhedral cone derived Hyperplanes from the kernel matrix. The matrix A is square ( m = n ). Without loss of generality, we assume that its rows are ( )( Consider the cone x T x = 1 + r 2 u T x ) 2 contained in the kth facet (that is A( k ,:) u = 0 ). Let’s compute the normalized. That is each row is a vector of norm 1. Recall that the spheric centre ws of the cone (direction of the centre of the largest sphere contained in the
  • 5. ellipsoid defined by the intersection of this cone and the  1  hyperplane wT x = 1 ( w is normal to the board).  δ 0  0    wT    1  Let Q = [ q1 ,, qn −2 ] = null   0      A( k ,:)   . Let h be the M =     0    intersection of the ray defined by u and the hyperplane 1   0  0 wT x = 1 . Let us make the change of variables  δn    x = h+Q z . One can easily check that Recall that if f ( x ) = Mx is a linear transformation and ∀z ∈ R , w ( h + Q z ) = 1 and A( k ,:) ( h + Q z ) = 0 n−2 T S is a subset of the vector space, then we have, vol( f ( S ) ) = abs( det ( M ) ) × vol( S ) . We derive now the equation of the ellipsoid with respect to z . From x T x = 1 + r 2 u T x ( )( ) 2 , we obtain n 1 The volume of the ellipsoid is therefore ∏ ( h + Q z ) T ( h + Q z ) = (1 + r 2 )(u T h + u T Q z ) 2 i =1 δi times After developing the expression, we get the n-volume of the n-sphere. h h + 2h Q z + z Q Q z = (1 + r )(u h) 2 T T T T 2 T + For completeness, let us mention that the volume of a n- (1 + r )( z Q T uu T Q z ) + (1 + r 2 )( 2u T hu T Q z ) 1 n 2 T 2r n π 2 sphere of radius r is × , and the volume of After regrouping, we have n Γ( 1 n ) 2 z T Q T ( I n − (1 + r 2 ) uu T )Q z + n 1 the n-rectangle containing the ellipsoid is 2 × ∏ n 2( h − (1 + r T 2 )(u hu ) ) Q z + T T i =1 δi h T h − (1 + r 2 )(u h ) = 0 T 2 . From this expression we can derive an expression of the 3.2.7 Distance from w to a Facet form ( z − c ) D ( z − c ) = b that will tell us the (n-2)- T volume of the ellipsoid and its centre. It is easy to check To compute the ( n − 1) -volume of the intersection of the that h + Q c is the centre of the ellipsoid in R n . board w T x = 1 and the polyhedral cone P, we need to find for each facet A( k , :) x ≤ 0 the point x in the plane In the next subsection, we show how to compute the generated by w and A( k , :) T volume of the ellipsoid. that belongs to this intersection. That is the orthogonal projection of w on 3.2.6 Volume of an Ellipsoid the hinge defined as the intersection of the board and the ( x − c ) D( x − c ) = b T kth cone facet. The point x = α A( k , :) + β w must T satisfy Without loss of generality, we assume that  wT x = 1 c = 0 and b = 1 . The matrix D is symmetric non-  . negative, therefore there exists a decomposition  A( k , :) x = 0 D = P∆P where P is orthogonal and ∆ non-negative T γα + β = 1 and diagonal. Therefore  where γ = A( k , :) w . Let y = P T x , the equation of the ellipsoid becomes α + γβ = 0 That is, ∑δ i yi2 = 1 . Let z = δ y . We can check that if γ 1  1 −1 x = [ A( k , :) w] ×  i i i T i y is on the ellipsoid then z is on the unit sphere and   .  1 γ  0 reciprocally. That is the ellipsoid is obtained from the unit sphere by the linear transformation of matrix M , 4. Implementation Issues and Experimental where Results
  • 6. We have implemented the exact computation of the The exact computation algorithm can be useful for centroid and the volume in Matlab. A direct recursive benchmarking to people developing new centroid implementation of Lasserre formula would be very approximation algorithms. We do not claim that our inefficient as faces of dimension k share faces of BBM approach is superior to any other given that the dimension k − 1 . Our implementation caches the computational cost is in the order of m times the cost of volumes and centroids of the lower dimensional faces in a a SVM computation (where m is the number of training hash-table. examples). Our algorithm has been validated by comparing the values Replacing the ellipsoids with a more accurate estimation returned with a Monte-Carlo method. would probably give better results, but deriving the volume and the centroid of the intersection of a facet and As Lasserre’s formula is valid only if the polyhedron is a board from the volume and the centroid of the represented as a system of non-redundant linear intersection of the same facet with another board seems to inequalities. Redundancy must be detected and be a hard problem. eliminated by using a linear optimization. The computation of the SVM point presented in section The kernel matrix of a Gaussian kernel can only be 3.2.2 provides an efficient learning algorithm for singular when identical input vectors occur more than Gaussian kernels. once the training set. We remove repeated occurrences of the same input vector and assign the most common label 4. Acknowledgement for this input vector to the occurrence that we leave in the training set. I would like to thank Professor Tom Downs and Professor Peter Bartlett for their valuable comments on a previous The table which follows summarises generalization version of the BBM algorithm. This work was partially performance (percentage of correct predictions on test supported by an ATN grant. sets) of the Balancing Board Machine (BBM) on 6 standard benchmarking data sets from the UCI References Repository, comparing results for illustrative purposes with equivalent hard margin support vector machines. In [1] Muller, K., Mika, S., Ratch, G., Tsuda, K., and each case the data was randomly partitioned into 20 Scholkopf, B. An Introduction To Kernel-Based training and test sets in the ratio 60%:40%. Learning Algorithms. IEEE Trans. on NN, vol 12, no 2, 2001, pp 181-201. Data set SVM BBM [2] Scholkopft, B., Smola, A., Learning with Kernels, heart disease 58.36 58.40 http://www.kernel-machines.org/ thyroid 94.34 95.23 [3] M. Opper and D. Haussler, Generalization diabetes 66.89 67.68 performance of Bayes optimal classification algorithm waveform 83.50 83.50 for learning a perceptron, Phys. Rev. Lett., vol. 66, p. sonar 85.06 85.78 2677, 1991. ionosphere 86.79 86.86 [4] J. Shawe-Taylor and R. C. Williamson, A PAC The results obtained with a BBM are comparable to those analysis of a Bayesian estimator, Royal Holloway, Univ. London, Tech. Rep. NC2-TR-1997-013, 1997. obtained with a BPM, but the improvement is not always as dramatic as those reported in (Herbrich et al., [9]). We [5] T. Graepel, R. Herbrich, and C. Campbell, Bayes observed that the improvement was generally better for point machines: Estimating the bayes point in kernel smaller data sets. We suspect that this is due to the fact space, in Proc.f IJCAI Workshop Support Vector the volumes considered become very small in high Machines, 1999, pp. 23-27. dimensional spaces. In fact, on a PC, unit spheres [6] T. Watkin, Optimal learning with a neural network, “vanish” when their dimension exceed 340. The volume Europhys. Lett., vol. 21, pp. 871-877, 1993. of a unit sphere of dimension 340 is 4.5 10 -223. This is why we consider the logarithm of the volume in our [7] P. Ruján, Playing billiard in version space, Neural programs. Comput., vol. 9, pp. 197-238, 1996. [8] R. Herbrich and T. Graepel, Large scale Bayes point 5. Conclusion machines, Advances in Neural Information System Processing 13, 2001.
  • 7. [9] R. Herbrich, T. Graepel, and C. Bayes Point Machines, Journal of Machine Learning Research, 1 (2001) 245--279. [10] Lasserre, J., An analytical Expression and an Algorithm for the volume of a Convex Polyhedron in Rn, Journal of Optimization Theory and Applications, Vol 39, No 3, 1983. Schrijver, A. Theory of Linear and Integer Programming, Wiley-Interscience Publication (1990). Theodore B. Trafalis, Alexander M. Malyscheff: An Analytic Center Machine. 203-223, Machine Learning, Volume 46, 2002