SlideShare uma empresa Scribd logo
1 de 79
Baixar para ler offline
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore
Support Vector
Machines
Modified from the slides by Dr. Andrew W. Moore
http://www.cs.cmu.edu/~awm/tutorials
Support Vector Machines: Slide 2Copyright © 2001, 2003, Andrew W. Moore
Linear Classifiers
fx
a
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
How would you
classify this data?
Support Vector Machines: Slide 3Copyright © 2001, 2003, Andrew W. Moore
Linear Classifiers
fx
a
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
How would you
classify this data?
Support Vector Machines: Slide 4Copyright © 2001, 2003, Andrew W. Moore
Linear Classifiers
fx
a
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
How would you
classify this data?
Support Vector Machines: Slide 5Copyright © 2001, 2003, Andrew W. Moore
Linear Classifiers
fx
a
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
How would you
classify this data?
Support Vector Machines: Slide 6Copyright © 2001, 2003, Andrew W. Moore
Linear Classifiers
fx
a
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
Any of these
would be fine..
..but which is
best?
Support Vector Machines: Slide 7Copyright © 2001, 2003, Andrew W. Moore
Classifier Margin
fx
a
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
Define the margin
of a linear
classifier as the
width that the
boundary could be
increased by
before hitting a
datapoint.
Support Vector Machines: Slide 8Copyright © 2001, 2003, Andrew W. Moore
Maximum Margin
fx
a
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
The maximum
margin linear
classifier is the
linear classifier
with the, um,
maximum margin.
This is the
simplest kind of
SVM (Called an
LSVM)
Linear SVM
Support Vector Machines: Slide 9Copyright © 2001, 2003, Andrew W. Moore
Maximum Margin
fx
a
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
The maximum
margin linear
classifier is the
linear classifier
with the, um,
maximum margin.
This is the
simplest kind of
SVM (Called an
LSVM)
Support Vectors
are those
datapoints that
the margin
pushes up
against
Linear SVM
Support Vector Machines: Slide 10Copyright © 2001, 2003, Andrew W. Moore
Why Maximum Margin?
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
The maximum
margin linear
classifier is the
linear classifier
with the, um,
maximum margin.
This is the
simplest kind of
SVM (Called an
LSVM)
Support Vectors
are those
datapoints that
the margin
pushes up
against
1. Intuitively this feels safest.
2. If we’ve made a small error in the
location of the boundary (it’s been
jolted in its perpendicular direction)
this gives us least chance of causing a
misclassification.
3. LOOCV is easy since the model is
immune to removal of any non-
support-vector datapoints.
4. There’s some theory (using VC
dimension) that is related to (but not
the same as) the proposition that this
is a good thing.
5. Empirically it works very very well.
Support Vector Machines: Slide 11Copyright © 2001, 2003, Andrew W. Moore
Estimate the Margin
• What is the distance expression for a point x to a
line wx+b= 0?
denotes +1
denotes -1 x
wx +b = 0
2 2
12
( )
d
ii
b b
d
w
   
 

x w x w
x
w
Support Vector Machines: Slide 12Copyright © 2001, 2003, Andrew W. Moore
Estimate the Margin
• What is the expression for margin?
denotes +1
denotes -1 wx +b = 0
2
1
margin min ( ) min
dD D
ii
b
d
w 

 
 
x x
x w
x
Margin
Support Vector Machines: Slide 13Copyright © 2001, 2003, Andrew W. Moore
Maximize Margin
denotes +1
denotes -1 wx +b = 0
,
,
2,
1
argmax margin( , , )
= argmax min ( )
argmax min
i
i
b
i
Db
i
dDb
ii
b D
d
b
w



 


w
xw
xw
w
x
x w
Margin
Support Vector Machines: Slide 14Copyright © 2001, 2003, Andrew W. Moore
Maximize Margin
denotes +1
denotes -1 wx +b = 0
 
2,
1
argmax min
subject to : 0
i
i
dDb
ii
i i i
b
w
D y b


 
    
xw
x w
x x w
Margin
• Min-max problem  game problem
Support Vector Machines: Slide 15Copyright © 2001, 2003, Andrew W. Moore
Maximize Margin
denotes +1
denotes -1 wx +b = 0
 
2,
1
argmax min
subject to : 0
i
i
dDb
ii
i i i
b
w
D y b


 
    
xw
x w
x x w
Margin
Strategy:
: 1i iD b    x x w
 
2
1
,
argmin
subject to : 1
d
ii
b
i i i
w
D y b

    

w
x x w
Support Vector Machines: Slide 16Copyright © 2001, 2003, Andrew W. Moore
Maximum Margin Linear Classifier
• How to solve it?
 
 
 
* * 2
1
,
1 1
2 2
{ , }= argmin
subject to
1
1
....
1
d
kk
w b
N N
w b w
y w x b
y w x b
y w x b

  
  
  

Support Vector Machines: Slide 17Copyright © 2001, 2003, Andrew W. Moore
Learning via Quadratic Programming
• QP is a well-studied class of optimization
algorithms to maximize a quadratic function of
some real-valued variables subject to linear
constraints.
Support Vector Machines: Slide 18Copyright © 2001, 2003, Andrew W. Moore
Quadratic Programming
argmin
2
T
T R
c  
u
u u
d uFind
nmnmnn
mm
mm
buauaua
buauaua
buauaua



...
:
...
...
2211
22222121
11212111
)()(22)(11)(
)2()2(22)2(11)2(
)1()1(22)1(11)1(
...
:
...
...
enmmenenen
nmmnnn
nmmnnn
buauaua
buauaua
buauaua






And subject to
n additional linear
inequality
constraints
eadditionallinear
equality
constraints
Quadratic criterion
Subject to
Support Vector Machines: Slide 19Copyright © 2001, 2003, Andrew W. Moore
Quadratic Programming
argmin
2
T
T R
c  
u
u u
d uFind
Subject to
nmnmnn
mm
mm
buauaua
buauaua
buauaua



...
:
...
...
2211
22222121
11212111
)()(22)(11)(
)2()2(22)2(11)2(
)1()1(22)1(11)1(
...
:
...
...
enmmenenen
nmmnnn
nmmnnn
buauaua
buauaua
buauaua






And subject to
n additional linear
inequality
constraints
eadditionallinear
equality
constraints
Quadratic criterion
Support Vector Machines: Slide 20Copyright © 2001, 2003, Andrew W. Moore
Quadratic Programming
 
* * 2
,
{ , }= min
subject to 1 for all training data ( , )
iiw b
i i i i
w b w
y w x b x y  

 
 
 
 
* *
,
1 1
2 2
{ , }= argmax 0 0
1
1
inequality constraints
....
1
T
w b
N N
w b w w w
y w x b
y w x b
y w x b
  
  

   


   
nI
Support Vector Machines: Slide 21Copyright © 2001, 2003, Andrew W. Moore
Uh-oh!
denotes +1
denotes -1
This is going to be a problem!
What should we do?
Support Vector Machines: Slide 22Copyright © 2001, 2003, Andrew W. Moore
Uh-oh!
denotes +1
denotes -1
This is going to be a problem!
What should we do?
Idea 1:
Find minimum w.w, while
minimizing number of
training set errors.
Problemette: Two things
to minimize makes for
an ill-defined
optimization
Support Vector Machines: Slide 23Copyright © 2001, 2003, Andrew W. Moore
Uh-oh!
denotes +1
denotes -1
This is going to be a problem!
What should we do?
Idea 1.1:
Minimize
w.w + C (#train errors)
There’s a serious practical
problem that’s about to make
us reject this approach. Can
you guess what it is?
Tradeoff parameter
Support Vector Machines: Slide 24Copyright © 2001, 2003, Andrew W. Moore
Uh-oh!
denotes +1
denotes -1
This is going to be a problem!
What should we do?
Idea 1.1:
Minimize
w.w + C (#train errors)
There’s a serious practical
problem that’s about to make
us reject this approach. Can
you guess what it is?
Tradeoff parameter
Can’t be expressed as a Quadratic
Programming problem.
Solving it may be too slow.
(Also, doesn’t distinguish between
disastrous errors and near misses)
Support Vector Machines: Slide 25Copyright © 2001, 2003, Andrew W. Moore
Uh-oh!
denotes +1
denotes -1
This is going to be a problem!
What should we do?
Idea 2.0:
Minimize
w.w + C (distance of error
points to their
correct place)
Support Vector Machines: Slide 26Copyright © 2001, 2003, Andrew W. Moore
Support Vector Machine (SVM) for
Noisy Data
• Any problem with the above
formulism?
 
 
 
d* * 2
1 1, ,
1 1 1
2 2 2
{ , }= min
1
1
...
1
N
i ji jw b
N N N
w b w c
y w x b
y w x b
y w x b





 

   
   
   
  denotes +1
denotes -1
1
2
3
Support Vector Machines: Slide 27Copyright © 2001, 2003, Andrew W. Moore
Support Vector Machine (SVM) for
Noisy Data
• Balance the trade off between
margin and classification errors
 
 
 
d* * 2
1 1, ,
1 1 1 1
2 2 2 2
{ , }= min
1 , 0
1 , 0
...
1 , 0
N
i ji jw b
N N N N
w b w c
y w x b
y w x b
y w x b


 
 
 
 

    
    
    
 
denotes +1
denotes -1
1
2
3
Support Vector Machines: Slide 28Copyright © 2001, 2003, Andrew W. Moore
Support Vector Machine for Noisy Data
 
 
 
* * 2
1
, ,
1 1 1 1
2 2 2 2
{ , }= argmin
1 , 0
1 , 0
inequality constraints
....
1 , 0
N
i ji j
w b
N N N N
w b w c
y w x b
y w x b
y w x b


 
 
 


    

     


     
 
How do we determine the appropriate value for c ?
Support Vector Machines: Slide 29Copyright © 2001, 2003, Andrew W. Moore
The Dual Form of QP
Maximize   

R
k
R
l
kllk
R
k
k Qααα
1 11 2
1
where ( )kl k l k lQ y y x x
Subject to these
constraints:
kCαk 0
Then define:


R
k
kkk yα
1
xw Then classify with:
f(x,w,b) = sign(w. x - b)
0
1

R
k
kk yα
Support Vector Machines: Slide 30Copyright © 2001, 2003, Andrew W. Moore
The Dual Form of QP
Maximize   

R
k
R
l
kllk
R
k
k Qααα
1 11 2
1
where ( )kl k l k lQ y y x x
Subject to these
constraints:
kCαk 0
Then define:


R
k
kkk yα
1
xw
0
1

R
k
kk yα
Support Vector Machines: Slide 31Copyright © 2001, 2003, Andrew W. Moore
An Equivalent QP
Maximize where ).( lklkkl yyQ xx
Subject to these
constraints:
kCαk 0
Then define:



R
k
kkk yα
1
xw
0
1

R
k
kk yα
Datapoints with ak > 0
will be the support
vectors
..so this sum only needs
to be over the
support vectors.
  

R
k
R
l
kllk
R
k
k Qααα
1 11 2
1
Support Vector Machines: Slide 32Copyright © 2001, 2003, Andrew W. Moore
Support Vectors
denotes +1
denotes -1
1w x b  
1w x b   
w
Support Vectors
Decision boundary is
determined only by those
support vectors !



R
k
kkk yα
1
xw
    : 1 0i i i ii y w x ba      
ai = 0 for non-support vectors
ai  0 for support vectors
Support Vector Machines: Slide 33Copyright © 2001, 2003, Andrew W. Moore
The Dual Form of QP
Maximize   

R
k
R
l
kllk
R
k
k Qααα
1 11 2
1
where ( )kl k l k lQ y y x x
Subject to these
constraints:
kCαk 0
Then define:


R
k
kkk yα
1
xw Then classify with:
f(x,w,b) = sign(w. x - b)
0
1

R
k
kk yα
How to determine b ?
Support Vector Machines: Slide 34Copyright © 2001, 2003, Andrew W. Moore
An Equivalent QP: Determine b
A linear programming problem !
 
 
 
* * 2
1
,
1 1 1 1
2 2 2 2
{ , }= argmin
1 , 0
1 , 0
....
1 , 0
N
i ji j
w b
N N N N
w b w c
y w x b
y w x b
y w x b

 
 
 


    
    
    
 
 
 
 
 
1
*
1
,
1 1 1 1
2 2 2 2
= argmin
1 , 0
1 , 0
....
1 , 0
N
i i
N
jj
b
N N N N
b
y w x b
y w x b
y w x b


 
 
 


    
    
    

Fix w
Support Vector Machines: Slide 35Copyright © 2001, 2003, Andrew W. Moore
  

R
k
R
l
kllk
R
k
k Qααα
1 11 2
1
An Equivalent QP
Maximize where ).( lklkkl yyQ xx
Subject to these
constraints:
kCαk 0
Then define:


R
k
kkk yα
1
xw
k
k
KKKK
αK
εyb
maxargwhere
.)1(

 wx
Then classify with:
f(x,w,b) = sign(w. x - b)
0
1

R
k
kk yα
Datapoints with ak > 0
will be the support
vectors
..so this sum only needs
to be over the
support vectors.
Why did I tell you about this
equivalent QP?
• It’s a formulation that QP
packages can optimize more
quickly
• Because of further jaw-
dropping developments
you’re about to learn.
Support Vector Machines: Slide 36Copyright © 2001, 2003, Andrew W. Moore
Suppose we’re in 1-dimension
What would
SVMs do with
this data?
x=0
Support Vector Machines: Slide 37Copyright © 2001, 2003, Andrew W. Moore
Suppose we’re in 1-dimension
Not a big surprise
Positive “plane” Negative “plane”
x=0
Support Vector Machines: Slide 38Copyright © 2001, 2003, Andrew W. Moore
Harder 1-dimensional dataset
That’s wiped the
smirk off SVM’s
face.
What can be
done about
this?
x=0
Support Vector Machines: Slide 39Copyright © 2001, 2003, Andrew W. Moore
Harder 1-dimensional dataset
Remember how
permitting non-
linear basis
functions made
linear regression
so much nicer?
Let’s permit them
here too
x=0 ),( 2
kkk xxz
Support Vector Machines: Slide 40Copyright © 2001, 2003, Andrew W. Moore
Harder 1-dimensional dataset
Remember how
permitting non-
linear basis
functions made
linear regression
so much nicer?
Let’s permit them
here too
x=0 ),( 2
kkk xxz
Support Vector Machines: Slide 41Copyright © 2001, 2003, Andrew W. Moore
Common SVM basis functions
zk = ( polynomial terms of xk of degree 1 to q )
zk = ( radial basis functions of xk )
zk = ( sigmoid functions of xk )
This is sensible.
Is that the end of the story?
No…there’s one more trick!







 
 2
2
||
exp)(][

jk
kjk φj
cx
xz
Support Vector Machines: Slide 42Copyright © 2001, 2003, Andrew W. Moore
Quadratic
Basis Functions

























































 mm
m
m
m
m
xx
xx
xx
xx
xx
xx
x
x
x
x
x
x
1
1
32
1
31
21
2
2
2
2
1
2
1
2
:
2
:
2
2
:
2
2
:
2
:
2
2
1
)(xΦ
Constant Term
Linear Terms
Pure
Quadratic
Terms
Quadratic
Cross-Terms
Number of terms (assuming m input
dimensions) = (m+2)-choose-2
= (m+2)(m+1)/2
= (as near as makes no difference) m2/2
You may be wondering what those
’s are doing.
•You should be happy that they do no
harm
•You’ll find out why they’re there soon.
2
Support Vector Machines: Slide 43Copyright © 2001, 2003, Andrew W. Moore
QP (old)
Maximize where ).( lklkkl yyQ xx
Subject to these
constraints:
kCαk 0
Then define:



R
k
kkk yα
1
xw Then classify with:
f(x,w,b) = sign(w. x - b)
0
1

R
k
kk yα
  

R
k
R
l
kllk
R
k
k Qααα
1 11 2
1
Support Vector Machines: Slide 44Copyright © 2001, 2003, Andrew W. Moore
QP with basis functions
where ))().(( lklkkl yyQ xΦxΦ
Subject to these
constraints:
kCαk 0
Then define: Then classify with:
f(x,w,b) = sign(w. f(x) - b)
0
1

R
k
kk yα


0s.t.
)(
kαk
kkk yα xΦw
Maximize   

R
k
R
l
kllk
R
k
k Qααα
1 11 2
1
Most important changes:
X  f(x)
Support Vector Machines: Slide 45Copyright © 2001, 2003, Andrew W. Moore
QP with basis functions
where ))().(( lklkkl yyQ xΦxΦ
Subject to these
constraints:
kCαk 0
Then define:
Then classify with:
f(x,w,b) = sign(w. f(x) - b)
0
1

R
k
kk yα
We must do R2/2 dot products to
get this matrix ready.
Each dot product requires m2/2
additions and multiplications
The whole thing costs R2 m2 /4.
Yeeks!
…or does it?


0s.t.
)(
kαk
kkk yα xΦw
Maximize   

R
k
R
l
kllk
R
k
k Qααα
1 11 2
1
Support Vector Machines: Slide 46Copyright © 2001, 2003, Andrew W. Moore
QuadraticDot
Products


















































































































 mm
m
m
m
m
mm
m
m
m
m
bb
bb
bb
bb
bb
bb
b
b
b
b
b
b
aa
aa
aa
aa
aa
aa
a
a
a
a
a
a
1
1
32
1
31
21
2
2
2
2
1
2
1
1
1
32
1
31
21
2
2
2
2
1
2
1
2
:
2
:
2
2
:
2
2
:
2
:
2
2
1
2
:
2
:
2
2
:
2
2
:
2
:
2
2
1
)()( bΦaΦ
1

m
i
iiba
1
2

m
i
ii ba
1
22
  
m
i
m
ij
jiji bbaa
1 1
2
+
+
+
Support Vector Machines: Slide 47Copyright © 2001, 2003, Andrew W. Moore
QuadraticDot
Products
 )()( bΦaΦ
   

m
i
m
ij
jiji
m
i
ii
m
i
ii bbaababa
1 11
22
1
221
Just out of casual, innocent, interest,
let’s look at another function of a and
b:
2
)1.( ba
1.2).( 2
 baba
12
1
2
1






  
m
i
ii
m
i
ii baba
12
11 1
   
m
i
ii
m
i
m
j
jjii bababa
122)(
11 11
2
    
m
i
ii
m
i
m
ij
jjii
m
i
ii babababa
Support Vector Machines: Slide 48Copyright © 2001, 2003, Andrew W. Moore
QuadraticDot
Products
 )()( bΦaΦ
Just out of casual, innocent, interest,
let’s look at another function of a and
b:
2
)1.( ba
1.2).( 2
 baba
12
1
2
1






  
m
i
ii
m
i
ii baba
12
11 1
   
m
i
ii
m
i
m
j
jjii bababa
122)(
11 11
2
    
m
i
ii
m
i
m
ij
jjii
m
i
ii babababa
They’re the same!
And this is only O(m) to
compute!
   

m
i
m
ij
jiji
m
i
ii
m
i
ii bbaababa
1 11
22
1
221
Support Vector Machines: Slide 49Copyright © 2001, 2003, Andrew W. Moore
QP with Quintic basis functions
Maximize   

R
k
R
l
kllk
R
k
k Qααα
1 11
where ))().(( lklkkl yyQ xΦxΦ
Subject to these
constraints:
kCαk 0
Then define:


0s.t.
)(
kαk
kkk yα xΦw
Then classify with:
f(x,w,b) = sign(w. f(x) - b)
0
1

R
k
kk yα
Support Vector Machines: Slide 50Copyright © 2001, 2003, Andrew W. Moore
QP with Quadratic basis functions
where ),( lklkkl KyyQ xx
Subject to these
constraints:
kCαk 0
Then define:
k
k
KKKK
αK
εyb
maxargwhere
.)1(

 wx
Then classify with:
f(x,w,b) = sign(K(w, x) - b)
0
1

R
k
kk yα


0s.t.
)(
kαk
kkk yα xΦw
Maximize   

R
k
R
l
kllk
R
k
k Qααα
1 11 2
1
Most important change:
),()().( lklk K xxxΦxΦ 
Support Vector Machines: Slide 51Copyright © 2001, 2003, Andrew W. Moore
Higher Order Polynomials
Poly-
nomial
f(x) Cost to
build Qkl
matrix
tradition
ally
Cost if 100
inputs
f(a).f(b) Cost to
build Qkl
matrix
sneakily
Cost if
100
inputs
Quadratic All m2/2
terms up to
degree 2
m2 R2 /4 2,500 R2 (a.b+1)2 m R2 / 2 50 R2
Cubic All m3/6
terms up to
degree 3
m3 R2 /12 83,000 R2 (a.b+1)3 m R2 / 2 50 R2
Quartic All m4/24
terms up to
degree 4
m4 R2 /48 1,960,000 R2 (a.b+1)4 m R2 / 2 50 R2
Support Vector Machines: Slide 52Copyright © 2001, 2003, Andrew W. Moore
SVM Kernel Functions
• K(a,b)=(a . b +1)d is an example of an SVM
Kernel Function
• Beyond polynomials there are other very high
dimensional basis functions that can be made
practical by finding the right Kernel Function
• Radial-Basis-style Kernel Function:
• Neural-net-style Kernel Function:





 
 2
2
2
)(
exp),(

ba
baK
).tanh(),(   babaK
Support Vector Machines: Slide 53Copyright © 2001, 2003, Andrew W. Moore
Kernel Tricks
• Replacing dot product with a kernel function
• Not all functions are kernel functions
• Need to be decomposable
• K(a,b) = f(a)  f(b)
• Could K(a,b) = (a-b)3 be a kernel function ?
• Could K(a,b) = (a-b)4 – (a+b)2 be a kernel
function?
Support Vector Machines: Slide 54Copyright © 2001, 2003, Andrew W. Moore
Kernel Tricks
• Mercer’s condition
To expand Kernel function K(x,y) into a dot product, i.e.
K(x,y)=(x)(y), K(x, y) has to be positive semi-definite
function, i.e., for any function f(x) whose is finite,
the following inequality holds
• Could be a kernel function?
( ) ( , ) ( ) 0dxdyf x K x y f y 
2
( )f x dx
 ( , )
p
i ii
K x y x y 
Support Vector Machines: Slide 55Copyright © 2001, 2003, Andrew W. Moore
Kernel Tricks
• Pro
• Introducing nonlinearity into the model
• Computational cheap
• Con
• Still have potential overfitting problems
Support Vector Machines: Slide 56Copyright © 2001, 2003, Andrew W. Moore
Nonlinear Kernel (I)
Support Vector Machines: Slide 57Copyright © 2001, 2003, Andrew W. Moore
Nonlinear Kernel (II)
Support Vector Machines: Slide 58Copyright © 2001, 2003, Andrew W. Moore
Kernelize Logistic Regression
How can we introduce the nonlinearity into the
logistic regression?
2
1 1
1
( | )
1 exp( )
1
( ) log
1 exp( )
N N
reg ki k
p y x
yx w
l c w
yx w
a  

  
 
  
 
Support Vector Machines: Slide 59Copyright © 2001, 2003, Andrew W. Moore
Kernelize Logistic Regression
1
1
( ), ( )
( , ) ( , )
N
i ii
N
i ii
x x w x
K w x K x x
f a f
a


 



 
 
1
1 , 1
1
1 1
( | )
1 exp( ( , )) 1 exp ( , )
1
( ) log ( , )
1 exp ( , )
N
i ii
N N
reg i j i ji i jN
i j j ij
p y x
yK x w y K x x
l c K x x
y K x x
a
a a a
a

 

 
   
 
 

 

• Representation Theorem
Support Vector Machines: Slide 60Copyright © 2001, 2003, Andrew W. Moore
Overfitting in SVM
Breast Cancer
0
0.02
0.04
0.06
0.08
1 2 3 4 5
PolyDegree
ClassificationError
Ionosphere
0
0.05
0.1
0.15
0.2
1 2 3 4 5
PolyDegree
ClassificationError
Training Error
Testing Error
Support Vector Machines: Slide 61Copyright © 2001, 2003, Andrew W. Moore
SVM Performance
• Anecdotally they work very very well indeed.
• Example: They are currently the best-known
classifier on a well-studied hand-written-character
recognition benchmark
• Another Example: Andrew knows several reliable
people doing practical real-world work who claim
that SVMs have saved them when their other
favorite classifiers did poorly.
• There is a lot of excitement and religious fervor
about SVMs as of 2001.
• Despite this, some practitioners are a little
skeptical.
Support Vector Machines: Slide 62Copyright © 2001, 2003, Andrew W. Moore
Diffusion Kernel
• Kernel function describes the correlation or similarity
between two data points
• Given that I have a function s(x,y) that describes the
similarity between two data points. Assume that it is a non-
negative and symmetric function. How can we generate a
kernel function based on this similarity function?
• A graph theory approach …
Support Vector Machines: Slide 63Copyright © 2001, 2003, Andrew W. Moore
Diffusion Kernel
• Create a graph for the data points
• Each vertex corresponds to a data point
• The weight of each edge is the similarity s(x,y)
• Graph Laplacian
• Properties of Laplacian
• Negative semi-definite
 
 
,
,
,
i j
i j
i kk i
s x x i j
L
s x x i j
 
 
  
Support Vector Machines: Slide 64Copyright © 2001, 2003, Andrew W. Moore
Diffusion Kernel
• Consider a simple Laplacian
• Consider
• What do these matrixes represent?
• A diffusion kernel
,
( )
1 and are connected
1k i
i j
i j
x N x
x x
L
i j

 
  
2 4
, ,...L L
Support Vector Machines: Slide 65Copyright © 2001, 2003, Andrew W. Moore
Diffusion Kernel
• Consider a simple Laplacian
• Consider
• What do these matrixes represent?
• A diffusion kernel
,
( )
1 and are connected
1k i
i j
i j
x N x
x x
L
i j

 
  
2 4
, ,...L L
Support Vector Machines: Slide 66Copyright © 2001, 2003, Andrew W. Moore
Diffusion Kernel: Properties
• Positive definite
• Local relationships L induce global relationships
• Works for undirected weighted graphs with
similarities
• How to compute the diffusion kernel
, orL d
K e K LK
d

  

 
( , ) ( , )i j j is x x s x x
L
e
Support Vector Machines: Slide 67Copyright © 2001, 2003, Andrew W. Moore
Computing Diffusion Kernel
• Singular value decomposition of Laplacian L
• What is L2 ?
1
1 2 1 2
1
,
( , ,..., ) ( , ,..., )
, :
T T
m m
m
m T
i i ii
T
i j i ji j





 
 
  
 
 

 

L = UΣU u u u u u u
u u
u u
 
2
2
1
,, 1 , 1
2
1
m T
i i ii
m mT T T
i j i i j j i j i j i ji j i j
m T
i i ii

    


 

 


 

L = u u
u u u u u u
u u
Support Vector Machines: Slide 68Copyright © 2001, 2003, Andrew W. Moore
Computing Diffusion Kernel
• What about Ln ?
• Compute diffusion kernel
2
1
m n T
i i ii

 L u u
L
e
 1 1 1
1 1 1
! !
!
i
n n n
mL n T
i i in n i
n n
m mT Ti
i i i ii n i
L
e
n n
e
n


 

 
 
  

  
 
 
  
 
  
  
u u
u u u u
Support Vector Machines: Slide 69Copyright © 2001, 2003, Andrew W. Moore
Doing multi-class classification
• SVMs can only handle two-class outputs (i.e. a
categorical output variable with arity 2).
• What can be done?
• Answer: with output arity N, learn N SVM’s
• SVM 1 learns “Output==1” vs “Output != 1”
• SVM 2 learns “Output==2” vs “Output != 2”
• :
• SVM N learns “Output==N” vs “Output != N”
• Then to predict the output for a new input, just
predict with each SVM and find out which one puts
the prediction the furthest into the positive region.
Support Vector Machines: Slide 70Copyright © 2001, 2003, Andrew W. Moore
Ranking Problem
• Consider a problem of ranking essays
• Three ranking categories: good, ok, bad
• Given a input document, predict its ranking
category
• How should we formulate this problem?
• A simple multiple class solution
• Each ranking category is a independent class
• But, there is something missing here …
• We miss the ordinal relationship between classes !
Support Vector Machines: Slide 71Copyright © 2001, 2003, Andrew W. Moore
Ordinal Regression
• Which choice is better?
• How could we formulate this problem?
‘good’
‘OK’
‘bad’
w
w’
Support Vector Machines: Slide 72Copyright © 2001, 2003, Andrew W. Moore
Ordinal Regression
• What are the two decision boundaries?
• What is the margin for ordinal regression?
• Maximize margin
1 20 and 0b b     w x w x
1
1 1
2
1
2
2 2
2
1
1 2 1 1 2 2
margin ( , ) min ( ) min
margin ( , ) min ( ) min
margin( , , )=min(margin ( , ),margin ( , ))
g o g o
o b o b
dD D D D
ii
dD D D D
ii
b
b d
w
b
b d
w
b b b b
   

   

 
 
 
 


x x
x x
x w
w x
x w
w x
w w w
1 2
* * *
1 2 1 2
, ,
{ , , } argmax margin( , , )
b b
b b b b
w
w w
Support Vector Machines: Slide 73Copyright © 2001, 2003, Andrew W. Moore
Ordinal Regression
• What are the two decision boundaries?
• What is the margin for ordinal regression?
• Maximize margin
1 20 and 0b b     w x w x
1
1 1
2
1
2
2 2
2
1
1 2 1 1 2 2
margin ( , ) min ( ) min
margin ( , ) min ( ) min
margin( , , )=min(margin ( , ),margin ( , ))
g o g o
o b o b
dD D D D
ii
dD D D D
ii
b
b d
w
b
b d
w
b b b b
   

   

 
 
 
 


x x
x x
x w
w x
x w
w x
w w w
1 2
* * *
1 2 1 2
, ,
{ , , } argmax margin( , , )
b b
b b b b
w
w w
Support Vector Machines: Slide 74Copyright © 2001, 2003, Andrew W. Moore
Ordinal Regression
• What are the two decision boundaries?
• What is the margin for ordinal regression?
• Maximize margin
1 20 and 0b b     w x w x
1
1 1
2
1
2
2 2
2
1
1 2 1 1 2 2
margin ( , ) min ( ) min
margin ( , ) min ( ) min
margin( , , )=min(margin ( , ),margin ( , ))
g o g o
o b o b
dD D D D
ii
dD D D D
ii
b
b d
w
b
b d
w
b b b b
   

   

 
 
 
 


x x
x x
x w
w x
x w
w x
w w w
1 2
* * *
1 2 1 2
, ,
{ , , } argmax margin( , , )
b b
b b b b
w
w w
Support Vector Machines: Slide 75Copyright © 2001, 2003, Andrew W. Moore
Ordinal Regression
• How do we solve this monster ?
1 2
1 2
1 2
* * *
1 2 1 2
, ,
1 1 2 2
, ,
1 2
2 2, ,
1 1
{ , , } arg max margin( , , )
arg max min(margin ( , ),margin ( , ))
arg max min min , min
g o o b
b b
b b
d dD D D Db b
i ii i
b b b b
b b
b b
w w   
 


    
 
 
  
w
w
x xw
w w
w w
x w x w
1
1 2
2
subject to
: 0
: 0, 0
: 0
i g i
i o i i
i b i
D b
D b b
D b
    
       
    
x x w
x x w x w
x x w
Support Vector Machines: Slide 76Copyright © 2001, 2003, Andrew W. Moore
Ordinal Regression
• The same old trick
• To remove the scaling invariance, set
• Now the problem is simplified as:
1
2
: 1
: 1
i g o i
i o b i
D D b
D D b
     
     
x x w
x x w
1 2
* * * 2
1 2 1
, ,
{ , , } argmin
d
ii
b b
b b w
 
w
w
1
1 2
2
subject to
: 1
: 1, 1
: 1
i g i
i o i i
i b i
D b
D b b
D b
    
        
     
x x w
x x w x w
x x w
Support Vector Machines: Slide 77Copyright © 2001, 2003, Andrew W. Moore
Ordinal Regression
• Noisy case
• Is this sufficient enough?
 
1 2
* * * 2
1 2 1
, ,
{ , , } argmin
i g i o i b
d
i i i i ii
x D x D x Db b
b b w c c c      

  
       
w
w
1
1 2
2
subject to
: 1 , 0
: 1 , 1 , 0, 0
: 1 , 0
i g i i i
i o i i i i i i
i b i i i
D b
D b b
D b
 
   
 
 
   
 
      
            
       
x x w
x x w x w
x x w
Support Vector Machines: Slide 78Copyright © 2001, 2003, Andrew W. Moore
Ordinal Regression
‘good’
‘OK’
‘bad’
w
 
1 2
* * * 2
1 2 1
, ,
{ , , } argmin
i g i o i b
d
i i i i ii
x D x D x Db b
b b w c c c      

  
       
w
w
1
1 2
2
1 2
subject to
: 1 , 0
: 1 , 1 , 0, 0
: 1 , 0
i g i i i
i o i i i i i i
i b i i i
D b
D b b
D b
b b
 
   
 
 
   
 
      
            
       

x x w
x x w x w
x x w
Support Vector Machines: Slide 79Copyright © 2001, 2003, Andrew W. Moore
References
• An excellent tutorial on VC-dimension and Support
Vector Machines:
C.J.C. Burges. A tutorial on support vector machines
for pattern recognition. Data Mining and Knowledge
Discovery, 2(2):955-974, 1998.
http://citeseer.nj.nec.com/burges98tutorial.html
• The VC/SRM/SVM Bible: (Not for beginners
including myself)
Statistical Learning Theory by Vladimir Vapnik, Wiley-
Interscience; 1998
• Software: SVM-light, http://svmlight.joachims.org/,
free download

Mais conteúdo relacionado

Mais procurados

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
A Simple Review on SVM
A Simple Review on SVMA Simple Review on SVM
A Simple Review on SVMHonglin Yu
 
A BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machineA BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machineAboul Ella Hassanien
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine LearningVARUN KUMAR
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaMacha Pujitha
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classificationYiwei Chen
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVMCarlo Carandang
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineMusa Hawamdah
 
Support Vector Machines (SVM)
Support Vector Machines (SVM)Support Vector Machines (SVM)
Support Vector Machines (SVM)FAO
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machinesNawal Sharma
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machinesbutest
 
Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machinesbutest
 

Mais procurados (18)

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
A Simple Review on SVM
A Simple Review on SVMA Simple Review on SVM
A Simple Review on SVM
 
A BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machineA BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machine
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Svm
SvmSvm
Svm
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using Weka
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classification
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Support Vector Machines (SVM)
Support Vector Machines (SVM)Support Vector Machines (SVM)
Support Vector Machines (SVM)
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machines
 
Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machines
 

Destaque

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesadil raja
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tearsAnkit Sharma
 
2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …
2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …
2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …Dongseo University
 
2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…
2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…
2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…Dongseo University
 
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble MethodsDongseo University
 
양성봉 - 알기쉬운 알고리즘 - 1장알고리즘의첫걸음
양성봉 - 알기쉬운 알고리즘 - 1장알고리즘의첫걸음양성봉 - 알기쉬운 알고리즘 - 1장알고리즘의첫걸음
양성봉 - 알기쉬운 알고리즘 - 1장알고리즘의첫걸음Dongseo University
 
18 Machine Learning Radial Basis Function Networks Forward Heuristics
18 Machine Learning Radial Basis Function Networks Forward Heuristics18 Machine Learning Radial Basis Function Networks Forward Heuristics
18 Machine Learning Radial Basis Function Networks Forward HeuristicsAndres Mendez-Vazquez
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesguestfee8698
 
Complex Support Vector Machines For Quaternary Classification
Complex Support Vector Machines For Quaternary ClassificationComplex Support Vector Machines For Quaternary Classification
Complex Support Vector Machines For Quaternary ClassificationPantelis Bouboulis
 
Knowledge extraction from support vector machines
Knowledge extraction from support vector machinesKnowledge extraction from support vector machines
Knowledge extraction from support vector machinesEyad Alshami
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machineRuta Kambli
 
Introduction to Radial Basis Function Networks
Introduction to Radial Basis Function NetworksIntroduction to Radial Basis Function Networks
Introduction to Radial Basis Function NetworksESCOM
 
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…Dongseo University
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayEuropeana Newspapers
 
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...Xi Wang
 
Radial Basis Function Network (RBFN)
Radial Basis Function Network (RBFN)Radial Basis Function Network (RBFN)
Radial Basis Function Network (RBFN)ahmad haidaroh
 
An overview of automatic brain tumor detection frommagnetic resonance images
An overview of automatic brain tumor detection frommagnetic resonance imagesAn overview of automatic brain tumor detection frommagnetic resonance images
An overview of automatic brain tumor detection frommagnetic resonance imagesMangesh Lingampalle
 
Voting Based Learning Classifier System for Multi-Label Classification
Voting Based Learning Classifier System for Multi-Label ClassificationVoting Based Learning Classifier System for Multi-Label Classification
Voting Based Learning Classifier System for Multi-Label ClassificationDaniele Loiacono
 

Destaque (20)

Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …
2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …
2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …
 
2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…
2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…
2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…
 
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
 
2015-2 W16 kernel and smo
2015-2 W16   kernel and smo2015-2 W16   kernel and smo
2015-2 W16 kernel and smo
 
양성봉 - 알기쉬운 알고리즘 - 1장알고리즘의첫걸음
양성봉 - 알기쉬운 알고리즘 - 1장알고리즘의첫걸음양성봉 - 알기쉬운 알고리즘 - 1장알고리즘의첫걸음
양성봉 - 알기쉬운 알고리즘 - 1장알고리즘의첫걸음
 
18 Machine Learning Radial Basis Function Networks Forward Heuristics
18 Machine Learning Radial Basis Function Networks Forward Heuristics18 Machine Learning Radial Basis Function Networks Forward Heuristics
18 Machine Learning Radial Basis Function Networks Forward Heuristics
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Complex Support Vector Machines For Quaternary Classification
Complex Support Vector Machines For Quaternary ClassificationComplex Support Vector Machines For Quaternary Classification
Complex Support Vector Machines For Quaternary Classification
 
Knowledge extraction from support vector machines
Knowledge extraction from support vector machinesKnowledge extraction from support vector machines
Knowledge extraction from support vector machines
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machine
 
Introduction to Radial Basis Function Networks
Introduction to Radial Basis Function NetworksIntroduction to Radial Basis Function Networks
Introduction to Radial Basis Function Networks
 
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information Day
 
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
2013 KDD conference presentation--"Multi-Label Relational Neighbor Classifica...
 
Radial Basis Function
Radial Basis FunctionRadial Basis Function
Radial Basis Function
 
Radial Basis Function Network (RBFN)
Radial Basis Function Network (RBFN)Radial Basis Function Network (RBFN)
Radial Basis Function Network (RBFN)
 
An overview of automatic brain tumor detection frommagnetic resonance images
An overview of automatic brain tumor detection frommagnetic resonance imagesAn overview of automatic brain tumor detection frommagnetic resonance images
An overview of automatic brain tumor detection frommagnetic resonance images
 
Voting Based Learning Classifier System for Multi-Label Classification
Voting Based Learning Classifier System for Multi-Label ClassificationVoting Based Learning Classifier System for Multi-Label Classification
Voting Based Learning Classifier System for Multi-Label Classification
 

Semelhante a Support Vector Machines Explained

Unit 4 SVM and AVR.ppt
Unit 4 SVM and AVR.pptUnit 4 SVM and AVR.ppt
Unit 4 SVM and AVR.pptRahul Borate
 
Notes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVMNotes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVMSyedSaimGardezi
 
lecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptlecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptNaglaaAbdelhady
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer visionzukun
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.pptMahimMajee
 
Supporting Vector Machine
Supporting Vector MachineSupporting Vector Machine
Supporting Vector MachineSumit Singh
 
Tutorial on Support Vector Machine
Tutorial on Support Vector MachineTutorial on Support Vector Machine
Tutorial on Support Vector MachineLoc Nguyen
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用Mark Chang
 
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用NTC.im(Notch Training Center)
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector MachinesAnish M M
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksESCOM
 
The Perceptron and its Learning Rule
The Perceptron and its Learning RuleThe Perceptron and its Learning Rule
The Perceptron and its Learning RuleNoor Ul Hudda Memon
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 
Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handoutokeee
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Cheng Feng
 

Semelhante a Support Vector Machines Explained (20)

svm-jain.ppt
svm-jain.pptsvm-jain.ppt
svm-jain.ppt
 
Unit 4 SVM and AVR.ppt
Unit 4 SVM and AVR.pptUnit 4 SVM and AVR.ppt
Unit 4 SVM and AVR.ppt
 
SvmHJ
SvmHJSvmHJ
SvmHJ
 
Lecture4 xing
Lecture4 xingLecture4 xing
Lecture4 xing
 
Notes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVMNotes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVM
 
lecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptlecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.ppt
 
05 history of cv a machine learning (theory) perspective on computer vision
05  history of cv a machine learning (theory) perspective on computer vision05  history of cv a machine learning (theory) perspective on computer vision
05 history of cv a machine learning (theory) perspective on computer vision
 
SVM (2).ppt
SVM (2).pptSVM (2).ppt
SVM (2).ppt
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.ppt
 
Diving into Tensorflow.js
Diving into Tensorflow.jsDiving into Tensorflow.js
Diving into Tensorflow.js
 
Supporting Vector Machine
Supporting Vector MachineSupporting Vector Machine
Supporting Vector Machine
 
Tutorial on Support Vector Machine
Tutorial on Support Vector MachineTutorial on Support Vector Machine
Tutorial on Support Vector Machine
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector Machines
 
Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networks
 
The Perceptron and its Learning Rule
The Perceptron and its Learning RuleThe Perceptron and its Learning Rule
The Perceptron and its Learning Rule
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handout
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01
 

Mais de Dongseo University

Lecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdfLecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdfDongseo University
 
Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03Dongseo University
 
Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02Dongseo University
 
Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01Dongseo University
 
Average Linear Selection Algorithm
Average Linear Selection AlgorithmAverage Linear Selection Algorithm
Average Linear Selection AlgorithmDongseo University
 
Lower Bound of Comparison Sort
Lower Bound of Comparison SortLower Bound of Comparison Sort
Lower Bound of Comparison SortDongseo University
 
Running Time of Building Binary Heap using Array
Running Time of Building Binary Heap using ArrayRunning Time of Building Binary Heap using Array
Running Time of Building Binary Heap using ArrayDongseo University
 
Proof By Math Induction Example
Proof By Math Induction ExampleProof By Math Induction Example
Proof By Math Induction ExampleDongseo University
 
Estimating probability distributions
Estimating probability distributionsEstimating probability distributions
Estimating probability distributionsDongseo University
 
2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)Dongseo University
 
2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)Dongseo University
 
2017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #52017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #5Dongseo University
 

Mais de Dongseo University (20)

Lecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdfLecture_NaturalPolicyGradientsTRPOPPO.pdf
Lecture_NaturalPolicyGradientsTRPOPPO.pdf
 
Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03Evolutionary Computation Lecture notes03
Evolutionary Computation Lecture notes03
 
Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02Evolutionary Computation Lecture notes02
Evolutionary Computation Lecture notes02
 
Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01Evolutionary Computation Lecture notes01
Evolutionary Computation Lecture notes01
 
Markov Chain Monte Carlo
Markov Chain Monte CarloMarkov Chain Monte Carlo
Markov Chain Monte Carlo
 
Simplex Lecture Notes
Simplex Lecture NotesSimplex Lecture Notes
Simplex Lecture Notes
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Median of Medians
Median of MediansMedian of Medians
Median of Medians
 
Average Linear Selection Algorithm
Average Linear Selection AlgorithmAverage Linear Selection Algorithm
Average Linear Selection Algorithm
 
Lower Bound of Comparison Sort
Lower Bound of Comparison SortLower Bound of Comparison Sort
Lower Bound of Comparison Sort
 
Running Time of Building Binary Heap using Array
Running Time of Building Binary Heap using ArrayRunning Time of Building Binary Heap using Array
Running Time of Building Binary Heap using Array
 
Running Time of MergeSort
Running Time of MergeSortRunning Time of MergeSort
Running Time of MergeSort
 
Binary Trees
Binary TreesBinary Trees
Binary Trees
 
Proof By Math Induction Example
Proof By Math Induction ExampleProof By Math Induction Example
Proof By Math Induction Example
 
TRPO and PPO notes
TRPO and PPO notesTRPO and PPO notes
TRPO and PPO notes
 
Estimating probability distributions
Estimating probability distributionsEstimating probability distributions
Estimating probability distributions
 
2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)2018-2 Machine Learning (Wasserstein GAN and BEGAN)
2018-2 Machine Learning (Wasserstein GAN and BEGAN)
 
2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)2018-2 Machine Learning (Linear regression, Logistic regression)
2018-2 Machine Learning (Linear regression, Logistic regression)
 
2017-2 ML W11 GAN #1
2017-2 ML W11 GAN #12017-2 ML W11 GAN #1
2017-2 ML W11 GAN #1
 
2017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #52017-2 ML W9 Reinforcement Learning #5
2017-2 ML W9 Reinforcement Learning #5
 

Último

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 

Último (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 

Support Vector Machines Explained

  • 1. Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Modified from the slides by Dr. Andrew W. Moore http://www.cs.cmu.edu/~awm/tutorials
  • 2. Support Vector Machines: Slide 2Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers fx a yest denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?
  • 3. Support Vector Machines: Slide 3Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers fx a yest denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?
  • 4. Support Vector Machines: Slide 4Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers fx a yest denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?
  • 5. Support Vector Machines: Slide 5Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers fx a yest denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?
  • 6. Support Vector Machines: Slide 6Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers fx a yest denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Any of these would be fine.. ..but which is best?
  • 7. Support Vector Machines: Slide 7Copyright © 2001, 2003, Andrew W. Moore Classifier Margin fx a yest denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.
  • 8. Support Vector Machines: Slide 8Copyright © 2001, 2003, Andrew W. Moore Maximum Margin fx a yest denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM
  • 9. Support Vector Machines: Slide 9Copyright © 2001, 2003, Andrew W. Moore Maximum Margin fx a yest denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against Linear SVM
  • 10. Support Vector Machines: Slide 10Copyright © 2001, 2003, Andrew W. Moore Why Maximum Margin? denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against 1. Intuitively this feels safest. 2. If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification. 3. LOOCV is easy since the model is immune to removal of any non- support-vector datapoints. 4. There’s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing. 5. Empirically it works very very well.
  • 11. Support Vector Machines: Slide 11Copyright © 2001, 2003, Andrew W. Moore Estimate the Margin • What is the distance expression for a point x to a line wx+b= 0? denotes +1 denotes -1 x wx +b = 0 2 2 12 ( ) d ii b b d w        x w x w x w
  • 12. Support Vector Machines: Slide 12Copyright © 2001, 2003, Andrew W. Moore Estimate the Margin • What is the expression for margin? denotes +1 denotes -1 wx +b = 0 2 1 margin min ( ) min dD D ii b d w       x x x w x Margin
  • 13. Support Vector Machines: Slide 13Copyright © 2001, 2003, Andrew W. Moore Maximize Margin denotes +1 denotes -1 wx +b = 0 , , 2, 1 argmax margin( , , ) = argmax min ( ) argmax min i i b i Db i dDb ii b D d b w        w xw xw w x x w Margin
  • 14. Support Vector Machines: Slide 14Copyright © 2001, 2003, Andrew W. Moore Maximize Margin denotes +1 denotes -1 wx +b = 0   2, 1 argmax min subject to : 0 i i dDb ii i i i b w D y b          xw x w x x w Margin • Min-max problem  game problem
  • 15. Support Vector Machines: Slide 15Copyright © 2001, 2003, Andrew W. Moore Maximize Margin denotes +1 denotes -1 wx +b = 0   2, 1 argmax min subject to : 0 i i dDb ii i i i b w D y b          xw x w x x w Margin Strategy: : 1i iD b    x x w   2 1 , argmin subject to : 1 d ii b i i i w D y b        w x x w
  • 16. Support Vector Machines: Slide 16Copyright © 2001, 2003, Andrew W. Moore Maximum Margin Linear Classifier • How to solve it?       * * 2 1 , 1 1 2 2 { , }= argmin subject to 1 1 .... 1 d kk w b N N w b w y w x b y w x b y w x b           
  • 17. Support Vector Machines: Slide 17Copyright © 2001, 2003, Andrew W. Moore Learning via Quadratic Programming • QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints.
  • 18. Support Vector Machines: Slide 18Copyright © 2001, 2003, Andrew W. Moore Quadratic Programming argmin 2 T T R c   u u u d uFind nmnmnn mm mm buauaua buauaua buauaua    ... : ... ... 2211 22222121 11212111 )()(22)(11)( )2()2(22)2(11)2( )1()1(22)1(11)1( ... : ... ... enmmenenen nmmnnn nmmnnn buauaua buauaua buauaua       And subject to n additional linear inequality constraints eadditionallinear equality constraints Quadratic criterion Subject to
  • 19. Support Vector Machines: Slide 19Copyright © 2001, 2003, Andrew W. Moore Quadratic Programming argmin 2 T T R c   u u u d uFind Subject to nmnmnn mm mm buauaua buauaua buauaua    ... : ... ... 2211 22222121 11212111 )()(22)(11)( )2()2(22)2(11)2( )1()1(22)1(11)1( ... : ... ... enmmenenen nmmnnn nmmnnn buauaua buauaua buauaua       And subject to n additional linear inequality constraints eadditionallinear equality constraints Quadratic criterion
  • 20. Support Vector Machines: Slide 20Copyright © 2001, 2003, Andrew W. Moore Quadratic Programming   * * 2 , { , }= min subject to 1 for all training data ( , ) iiw b i i i i w b w y w x b x y            * * , 1 1 2 2 { , }= argmax 0 0 1 1 inequality constraints .... 1 T w b N N w b w w w y w x b y w x b y w x b                  nI
  • 21. Support Vector Machines: Slide 21Copyright © 2001, 2003, Andrew W. Moore Uh-oh! denotes +1 denotes -1 This is going to be a problem! What should we do?
  • 22. Support Vector Machines: Slide 22Copyright © 2001, 2003, Andrew W. Moore Uh-oh! denotes +1 denotes -1 This is going to be a problem! What should we do? Idea 1: Find minimum w.w, while minimizing number of training set errors. Problemette: Two things to minimize makes for an ill-defined optimization
  • 23. Support Vector Machines: Slide 23Copyright © 2001, 2003, Andrew W. Moore Uh-oh! denotes +1 denotes -1 This is going to be a problem! What should we do? Idea 1.1: Minimize w.w + C (#train errors) There’s a serious practical problem that’s about to make us reject this approach. Can you guess what it is? Tradeoff parameter
  • 24. Support Vector Machines: Slide 24Copyright © 2001, 2003, Andrew W. Moore Uh-oh! denotes +1 denotes -1 This is going to be a problem! What should we do? Idea 1.1: Minimize w.w + C (#train errors) There’s a serious practical problem that’s about to make us reject this approach. Can you guess what it is? Tradeoff parameter Can’t be expressed as a Quadratic Programming problem. Solving it may be too slow. (Also, doesn’t distinguish between disastrous errors and near misses)
  • 25. Support Vector Machines: Slide 25Copyright © 2001, 2003, Andrew W. Moore Uh-oh! denotes +1 denotes -1 This is going to be a problem! What should we do? Idea 2.0: Minimize w.w + C (distance of error points to their correct place)
  • 26. Support Vector Machines: Slide 26Copyright © 2001, 2003, Andrew W. Moore Support Vector Machine (SVM) for Noisy Data • Any problem with the above formulism?       d* * 2 1 1, , 1 1 1 2 2 2 { , }= min 1 1 ... 1 N i ji jw b N N N w b w c y w x b y w x b y w x b                       denotes +1 denotes -1 1 2 3
  • 27. Support Vector Machines: Slide 27Copyright © 2001, 2003, Andrew W. Moore Support Vector Machine (SVM) for Noisy Data • Balance the trade off between margin and classification errors       d* * 2 1 1, , 1 1 1 1 2 2 2 2 { , }= min 1 , 0 1 , 0 ... 1 , 0 N i ji jw b N N N N w b w c y w x b y w x b y w x b                             denotes +1 denotes -1 1 2 3
  • 28. Support Vector Machines: Slide 28Copyright © 2001, 2003, Andrew W. Moore Support Vector Machine for Noisy Data       * * 2 1 , , 1 1 1 1 2 2 2 2 { , }= argmin 1 , 0 1 , 0 inequality constraints .... 1 , 0 N i ji j w b N N N N w b w c y w x b y w x b y w x b                                 How do we determine the appropriate value for c ?
  • 29. Support Vector Machines: Slide 29Copyright © 2001, 2003, Andrew W. Moore The Dual Form of QP Maximize     R k R l kllk R k k Qααα 1 11 2 1 where ( )kl k l k lQ y y x x Subject to these constraints: kCαk 0 Then define:   R k kkk yα 1 xw Then classify with: f(x,w,b) = sign(w. x - b) 0 1  R k kk yα
  • 30. Support Vector Machines: Slide 30Copyright © 2001, 2003, Andrew W. Moore The Dual Form of QP Maximize     R k R l kllk R k k Qααα 1 11 2 1 where ( )kl k l k lQ y y x x Subject to these constraints: kCαk 0 Then define:   R k kkk yα 1 xw 0 1  R k kk yα
  • 31. Support Vector Machines: Slide 31Copyright © 2001, 2003, Andrew W. Moore An Equivalent QP Maximize where ).( lklkkl yyQ xx Subject to these constraints: kCαk 0 Then define:    R k kkk yα 1 xw 0 1  R k kk yα Datapoints with ak > 0 will be the support vectors ..so this sum only needs to be over the support vectors.     R k R l kllk R k k Qααα 1 11 2 1
  • 32. Support Vector Machines: Slide 32Copyright © 2001, 2003, Andrew W. Moore Support Vectors denotes +1 denotes -1 1w x b   1w x b    w Support Vectors Decision boundary is determined only by those support vectors !    R k kkk yα 1 xw     : 1 0i i i ii y w x ba       ai = 0 for non-support vectors ai  0 for support vectors
  • 33. Support Vector Machines: Slide 33Copyright © 2001, 2003, Andrew W. Moore The Dual Form of QP Maximize     R k R l kllk R k k Qααα 1 11 2 1 where ( )kl k l k lQ y y x x Subject to these constraints: kCαk 0 Then define:   R k kkk yα 1 xw Then classify with: f(x,w,b) = sign(w. x - b) 0 1  R k kk yα How to determine b ?
  • 34. Support Vector Machines: Slide 34Copyright © 2001, 2003, Andrew W. Moore An Equivalent QP: Determine b A linear programming problem !       * * 2 1 , 1 1 1 1 2 2 2 2 { , }= argmin 1 , 0 1 , 0 .... 1 , 0 N i ji j w b N N N N w b w c y w x b y w x b y w x b                                   1 * 1 , 1 1 1 1 2 2 2 2 = argmin 1 , 0 1 , 0 .... 1 , 0 N i i N jj b N N N N b y w x b y w x b y w x b                           Fix w
  • 35. Support Vector Machines: Slide 35Copyright © 2001, 2003, Andrew W. Moore     R k R l kllk R k k Qααα 1 11 2 1 An Equivalent QP Maximize where ).( lklkkl yyQ xx Subject to these constraints: kCαk 0 Then define:   R k kkk yα 1 xw k k KKKK αK εyb maxargwhere .)1(   wx Then classify with: f(x,w,b) = sign(w. x - b) 0 1  R k kk yα Datapoints with ak > 0 will be the support vectors ..so this sum only needs to be over the support vectors. Why did I tell you about this equivalent QP? • It’s a formulation that QP packages can optimize more quickly • Because of further jaw- dropping developments you’re about to learn.
  • 36. Support Vector Machines: Slide 36Copyright © 2001, 2003, Andrew W. Moore Suppose we’re in 1-dimension What would SVMs do with this data? x=0
  • 37. Support Vector Machines: Slide 37Copyright © 2001, 2003, Andrew W. Moore Suppose we’re in 1-dimension Not a big surprise Positive “plane” Negative “plane” x=0
  • 38. Support Vector Machines: Slide 38Copyright © 2001, 2003, Andrew W. Moore Harder 1-dimensional dataset That’s wiped the smirk off SVM’s face. What can be done about this? x=0
  • 39. Support Vector Machines: Slide 39Copyright © 2001, 2003, Andrew W. Moore Harder 1-dimensional dataset Remember how permitting non- linear basis functions made linear regression so much nicer? Let’s permit them here too x=0 ),( 2 kkk xxz
  • 40. Support Vector Machines: Slide 40Copyright © 2001, 2003, Andrew W. Moore Harder 1-dimensional dataset Remember how permitting non- linear basis functions made linear regression so much nicer? Let’s permit them here too x=0 ),( 2 kkk xxz
  • 41. Support Vector Machines: Slide 41Copyright © 2001, 2003, Andrew W. Moore Common SVM basis functions zk = ( polynomial terms of xk of degree 1 to q ) zk = ( radial basis functions of xk ) zk = ( sigmoid functions of xk ) This is sensible. Is that the end of the story? No…there’s one more trick!           2 2 || exp)(][  jk kjk φj cx xz
  • 42. Support Vector Machines: Slide 42Copyright © 2001, 2003, Andrew W. Moore Quadratic Basis Functions                                                           mm m m m m xx xx xx xx xx xx x x x x x x 1 1 32 1 31 21 2 2 2 2 1 2 1 2 : 2 : 2 2 : 2 2 : 2 : 2 2 1 )(xΦ Constant Term Linear Terms Pure Quadratic Terms Quadratic Cross-Terms Number of terms (assuming m input dimensions) = (m+2)-choose-2 = (m+2)(m+1)/2 = (as near as makes no difference) m2/2 You may be wondering what those ’s are doing. •You should be happy that they do no harm •You’ll find out why they’re there soon. 2
  • 43. Support Vector Machines: Slide 43Copyright © 2001, 2003, Andrew W. Moore QP (old) Maximize where ).( lklkkl yyQ xx Subject to these constraints: kCαk 0 Then define:    R k kkk yα 1 xw Then classify with: f(x,w,b) = sign(w. x - b) 0 1  R k kk yα     R k R l kllk R k k Qααα 1 11 2 1
  • 44. Support Vector Machines: Slide 44Copyright © 2001, 2003, Andrew W. Moore QP with basis functions where ))().(( lklkkl yyQ xΦxΦ Subject to these constraints: kCαk 0 Then define: Then classify with: f(x,w,b) = sign(w. f(x) - b) 0 1  R k kk yα   0s.t. )( kαk kkk yα xΦw Maximize     R k R l kllk R k k Qααα 1 11 2 1 Most important changes: X  f(x)
  • 45. Support Vector Machines: Slide 45Copyright © 2001, 2003, Andrew W. Moore QP with basis functions where ))().(( lklkkl yyQ xΦxΦ Subject to these constraints: kCαk 0 Then define: Then classify with: f(x,w,b) = sign(w. f(x) - b) 0 1  R k kk yα We must do R2/2 dot products to get this matrix ready. Each dot product requires m2/2 additions and multiplications The whole thing costs R2 m2 /4. Yeeks! …or does it?   0s.t. )( kαk kkk yα xΦw Maximize     R k R l kllk R k k Qααα 1 11 2 1
  • 46. Support Vector Machines: Slide 46Copyright © 2001, 2003, Andrew W. Moore QuadraticDot Products                                                                                                                    mm m m m m mm m m m m bb bb bb bb bb bb b b b b b b aa aa aa aa aa aa a a a a a a 1 1 32 1 31 21 2 2 2 2 1 2 1 1 1 32 1 31 21 2 2 2 2 1 2 1 2 : 2 : 2 2 : 2 2 : 2 : 2 2 1 2 : 2 : 2 2 : 2 2 : 2 : 2 2 1 )()( bΦaΦ 1  m i iiba 1 2  m i ii ba 1 22    m i m ij jiji bbaa 1 1 2 + + +
  • 47. Support Vector Machines: Slide 47Copyright © 2001, 2003, Andrew W. Moore QuadraticDot Products  )()( bΦaΦ      m i m ij jiji m i ii m i ii bbaababa 1 11 22 1 221 Just out of casual, innocent, interest, let’s look at another function of a and b: 2 )1.( ba 1.2).( 2  baba 12 1 2 1          m i ii m i ii baba 12 11 1     m i ii m i m j jjii bababa 122)( 11 11 2      m i ii m i m ij jjii m i ii babababa
  • 48. Support Vector Machines: Slide 48Copyright © 2001, 2003, Andrew W. Moore QuadraticDot Products  )()( bΦaΦ Just out of casual, innocent, interest, let’s look at another function of a and b: 2 )1.( ba 1.2).( 2  baba 12 1 2 1          m i ii m i ii baba 12 11 1     m i ii m i m j jjii bababa 122)( 11 11 2      m i ii m i m ij jjii m i ii babababa They’re the same! And this is only O(m) to compute!      m i m ij jiji m i ii m i ii bbaababa 1 11 22 1 221
  • 49. Support Vector Machines: Slide 49Copyright © 2001, 2003, Andrew W. Moore QP with Quintic basis functions Maximize     R k R l kllk R k k Qααα 1 11 where ))().(( lklkkl yyQ xΦxΦ Subject to these constraints: kCαk 0 Then define:   0s.t. )( kαk kkk yα xΦw Then classify with: f(x,w,b) = sign(w. f(x) - b) 0 1  R k kk yα
  • 50. Support Vector Machines: Slide 50Copyright © 2001, 2003, Andrew W. Moore QP with Quadratic basis functions where ),( lklkkl KyyQ xx Subject to these constraints: kCαk 0 Then define: k k KKKK αK εyb maxargwhere .)1(   wx Then classify with: f(x,w,b) = sign(K(w, x) - b) 0 1  R k kk yα   0s.t. )( kαk kkk yα xΦw Maximize     R k R l kllk R k k Qααα 1 11 2 1 Most important change: ),()().( lklk K xxxΦxΦ 
  • 51. Support Vector Machines: Slide 51Copyright © 2001, 2003, Andrew W. Moore Higher Order Polynomials Poly- nomial f(x) Cost to build Qkl matrix tradition ally Cost if 100 inputs f(a).f(b) Cost to build Qkl matrix sneakily Cost if 100 inputs Quadratic All m2/2 terms up to degree 2 m2 R2 /4 2,500 R2 (a.b+1)2 m R2 / 2 50 R2 Cubic All m3/6 terms up to degree 3 m3 R2 /12 83,000 R2 (a.b+1)3 m R2 / 2 50 R2 Quartic All m4/24 terms up to degree 4 m4 R2 /48 1,960,000 R2 (a.b+1)4 m R2 / 2 50 R2
  • 52. Support Vector Machines: Slide 52Copyright © 2001, 2003, Andrew W. Moore SVM Kernel Functions • K(a,b)=(a . b +1)d is an example of an SVM Kernel Function • Beyond polynomials there are other very high dimensional basis functions that can be made practical by finding the right Kernel Function • Radial-Basis-style Kernel Function: • Neural-net-style Kernel Function:         2 2 2 )( exp),(  ba baK ).tanh(),(   babaK
  • 53. Support Vector Machines: Slide 53Copyright © 2001, 2003, Andrew W. Moore Kernel Tricks • Replacing dot product with a kernel function • Not all functions are kernel functions • Need to be decomposable • K(a,b) = f(a)  f(b) • Could K(a,b) = (a-b)3 be a kernel function ? • Could K(a,b) = (a-b)4 – (a+b)2 be a kernel function?
  • 54. Support Vector Machines: Slide 54Copyright © 2001, 2003, Andrew W. Moore Kernel Tricks • Mercer’s condition To expand Kernel function K(x,y) into a dot product, i.e. K(x,y)=(x)(y), K(x, y) has to be positive semi-definite function, i.e., for any function f(x) whose is finite, the following inequality holds • Could be a kernel function? ( ) ( , ) ( ) 0dxdyf x K x y f y  2 ( )f x dx  ( , ) p i ii K x y x y 
  • 55. Support Vector Machines: Slide 55Copyright © 2001, 2003, Andrew W. Moore Kernel Tricks • Pro • Introducing nonlinearity into the model • Computational cheap • Con • Still have potential overfitting problems
  • 56. Support Vector Machines: Slide 56Copyright © 2001, 2003, Andrew W. Moore Nonlinear Kernel (I)
  • 57. Support Vector Machines: Slide 57Copyright © 2001, 2003, Andrew W. Moore Nonlinear Kernel (II)
  • 58. Support Vector Machines: Slide 58Copyright © 2001, 2003, Andrew W. Moore Kernelize Logistic Regression How can we introduce the nonlinearity into the logistic regression? 2 1 1 1 ( | ) 1 exp( ) 1 ( ) log 1 exp( ) N N reg ki k p y x yx w l c w yx w a             
  • 59. Support Vector Machines: Slide 59Copyright © 2001, 2003, Andrew W. Moore Kernelize Logistic Regression 1 1 ( ), ( ) ( , ) ( , ) N i ii N i ii x x w x K w x K x x f a f a            1 1 , 1 1 1 1 ( | ) 1 exp( ( , )) 1 exp ( , ) 1 ( ) log ( , ) 1 exp ( , ) N i ii N N reg i j i ji i jN i j j ij p y x yK x w y K x x l c K x x y K x x a a a a a                   • Representation Theorem
  • 60. Support Vector Machines: Slide 60Copyright © 2001, 2003, Andrew W. Moore Overfitting in SVM Breast Cancer 0 0.02 0.04 0.06 0.08 1 2 3 4 5 PolyDegree ClassificationError Ionosphere 0 0.05 0.1 0.15 0.2 1 2 3 4 5 PolyDegree ClassificationError Training Error Testing Error
  • 61. Support Vector Machines: Slide 61Copyright © 2001, 2003, Andrew W. Moore SVM Performance • Anecdotally they work very very well indeed. • Example: They are currently the best-known classifier on a well-studied hand-written-character recognition benchmark • Another Example: Andrew knows several reliable people doing practical real-world work who claim that SVMs have saved them when their other favorite classifiers did poorly. • There is a lot of excitement and religious fervor about SVMs as of 2001. • Despite this, some practitioners are a little skeptical.
  • 62. Support Vector Machines: Slide 62Copyright © 2001, 2003, Andrew W. Moore Diffusion Kernel • Kernel function describes the correlation or similarity between two data points • Given that I have a function s(x,y) that describes the similarity between two data points. Assume that it is a non- negative and symmetric function. How can we generate a kernel function based on this similarity function? • A graph theory approach …
  • 63. Support Vector Machines: Slide 63Copyright © 2001, 2003, Andrew W. Moore Diffusion Kernel • Create a graph for the data points • Each vertex corresponds to a data point • The weight of each edge is the similarity s(x,y) • Graph Laplacian • Properties of Laplacian • Negative semi-definite     , , , i j i j i kk i s x x i j L s x x i j       
  • 64. Support Vector Machines: Slide 64Copyright © 2001, 2003, Andrew W. Moore Diffusion Kernel • Consider a simple Laplacian • Consider • What do these matrixes represent? • A diffusion kernel , ( ) 1 and are connected 1k i i j i j x N x x x L i j       2 4 , ,...L L
  • 65. Support Vector Machines: Slide 65Copyright © 2001, 2003, Andrew W. Moore Diffusion Kernel • Consider a simple Laplacian • Consider • What do these matrixes represent? • A diffusion kernel , ( ) 1 and are connected 1k i i j i j x N x x x L i j       2 4 , ,...L L
  • 66. Support Vector Machines: Slide 66Copyright © 2001, 2003, Andrew W. Moore Diffusion Kernel: Properties • Positive definite • Local relationships L induce global relationships • Works for undirected weighted graphs with similarities • How to compute the diffusion kernel , orL d K e K LK d        ( , ) ( , )i j j is x x s x x L e
  • 67. Support Vector Machines: Slide 67Copyright © 2001, 2003, Andrew W. Moore Computing Diffusion Kernel • Singular value decomposition of Laplacian L • What is L2 ? 1 1 2 1 2 1 , ( , ,..., ) ( , ,..., ) , : T T m m m m T i i ii T i j i ji j                     L = UΣU u u u u u u u u u u   2 2 1 ,, 1 , 1 2 1 m T i i ii m mT T T i j i i j j i j i j i ji j i j m T i i ii                   L = u u u u u u u u u u
  • 68. Support Vector Machines: Slide 68Copyright © 2001, 2003, Andrew W. Moore Computing Diffusion Kernel • What about Ln ? • Compute diffusion kernel 2 1 m n T i i ii   L u u L e  1 1 1 1 1 1 ! ! ! i n n n mL n T i i in n i n n m mT Ti i i i ii n i L e n n e n                                u u u u u u
  • 69. Support Vector Machines: Slide 69Copyright © 2001, 2003, Andrew W. Moore Doing multi-class classification • SVMs can only handle two-class outputs (i.e. a categorical output variable with arity 2). • What can be done? • Answer: with output arity N, learn N SVM’s • SVM 1 learns “Output==1” vs “Output != 1” • SVM 2 learns “Output==2” vs “Output != 2” • : • SVM N learns “Output==N” vs “Output != N” • Then to predict the output for a new input, just predict with each SVM and find out which one puts the prediction the furthest into the positive region.
  • 70. Support Vector Machines: Slide 70Copyright © 2001, 2003, Andrew W. Moore Ranking Problem • Consider a problem of ranking essays • Three ranking categories: good, ok, bad • Given a input document, predict its ranking category • How should we formulate this problem? • A simple multiple class solution • Each ranking category is a independent class • But, there is something missing here … • We miss the ordinal relationship between classes !
  • 71. Support Vector Machines: Slide 71Copyright © 2001, 2003, Andrew W. Moore Ordinal Regression • Which choice is better? • How could we formulate this problem? ‘good’ ‘OK’ ‘bad’ w w’
  • 72. Support Vector Machines: Slide 72Copyright © 2001, 2003, Andrew W. Moore Ordinal Regression • What are the two decision boundaries? • What is the margin for ordinal regression? • Maximize margin 1 20 and 0b b     w x w x 1 1 1 2 1 2 2 2 2 1 1 2 1 1 2 2 margin ( , ) min ( ) min margin ( , ) min ( ) min margin( , , )=min(margin ( , ),margin ( , )) g o g o o b o b dD D D D ii dD D D D ii b b d w b b d w b b b b                     x x x x x w w x x w w x w w w 1 2 * * * 1 2 1 2 , , { , , } argmax margin( , , ) b b b b b b w w w
  • 73. Support Vector Machines: Slide 73Copyright © 2001, 2003, Andrew W. Moore Ordinal Regression • What are the two decision boundaries? • What is the margin for ordinal regression? • Maximize margin 1 20 and 0b b     w x w x 1 1 1 2 1 2 2 2 2 1 1 2 1 1 2 2 margin ( , ) min ( ) min margin ( , ) min ( ) min margin( , , )=min(margin ( , ),margin ( , )) g o g o o b o b dD D D D ii dD D D D ii b b d w b b d w b b b b                     x x x x x w w x x w w x w w w 1 2 * * * 1 2 1 2 , , { , , } argmax margin( , , ) b b b b b b w w w
  • 74. Support Vector Machines: Slide 74Copyright © 2001, 2003, Andrew W. Moore Ordinal Regression • What are the two decision boundaries? • What is the margin for ordinal regression? • Maximize margin 1 20 and 0b b     w x w x 1 1 1 2 1 2 2 2 2 1 1 2 1 1 2 2 margin ( , ) min ( ) min margin ( , ) min ( ) min margin( , , )=min(margin ( , ),margin ( , )) g o g o o b o b dD D D D ii dD D D D ii b b d w b b d w b b b b                     x x x x x w w x x w w x w w w 1 2 * * * 1 2 1 2 , , { , , } argmax margin( , , ) b b b b b b w w w
  • 75. Support Vector Machines: Slide 75Copyright © 2001, 2003, Andrew W. Moore Ordinal Regression • How do we solve this monster ? 1 2 1 2 1 2 * * * 1 2 1 2 , , 1 1 2 2 , , 1 2 2 2, , 1 1 { , , } arg max margin( , , ) arg max min(margin ( , ),margin ( , )) arg max min min , min g o o b b b b b d dD D D Db b i ii i b b b b b b b b w w                    w w x xw w w w w x w x w 1 1 2 2 subject to : 0 : 0, 0 : 0 i g i i o i i i b i D b D b b D b                   x x w x x w x w x x w
  • 76. Support Vector Machines: Slide 76Copyright © 2001, 2003, Andrew W. Moore Ordinal Regression • The same old trick • To remove the scaling invariance, set • Now the problem is simplified as: 1 2 : 1 : 1 i g o i i o b i D D b D D b             x x w x x w 1 2 * * * 2 1 2 1 , , { , , } argmin d ii b b b b w   w w 1 1 2 2 subject to : 1 : 1, 1 : 1 i g i i o i i i b i D b D b b D b                     x x w x x w x w x x w
  • 77. Support Vector Machines: Slide 77Copyright © 2001, 2003, Andrew W. Moore Ordinal Regression • Noisy case • Is this sufficient enough?   1 2 * * * 2 1 2 1 , , { , , } argmin i g i o i b d i i i i ii x D x D x Db b b b w c c c                   w w 1 1 2 2 subject to : 1 , 0 : 1 , 1 , 0, 0 : 1 , 0 i g i i i i o i i i i i i i b i i i D b D b b D b                                             x x w x x w x w x x w
  • 78. Support Vector Machines: Slide 78Copyright © 2001, 2003, Andrew W. Moore Ordinal Regression ‘good’ ‘OK’ ‘bad’ w   1 2 * * * 2 1 2 1 , , { , , } argmin i g i o i b d i i i i ii x D x D x Db b b b w c c c                   w w 1 1 2 2 1 2 subject to : 1 , 0 : 1 , 1 , 0, 0 : 1 , 0 i g i i i i o i i i i i i i b i i i D b D b b D b b b                                              x x w x x w x w x x w
  • 79. Support Vector Machines: Slide 79Copyright © 2001, 2003, Andrew W. Moore References • An excellent tutorial on VC-dimension and Support Vector Machines: C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998. http://citeseer.nj.nec.com/burges98tutorial.html • The VC/SRM/SVM Bible: (Not for beginners including myself) Statistical Learning Theory by Vladimir Vapnik, Wiley- Interscience; 1998 • Software: SVM-light, http://svmlight.joachims.org/, free download