TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Support Vector Machine
1. Introduction to Support Vector Machine
Lucas Xu
September 4, 2012
Lucas Xu Introduction to Support Vector Machine September 4, 2012 1 / 20
2. 1 Classifier
2 Hyper-Plane
3 Convex Optimization
4 Kernel
5 Application
Lucas Xu Introduction to Support Vector Machine September 4, 2012 2 / 20
3. Classifier
Attributes and Class Labels
Training Data
S = (x(1) , y (1) ), · · · , (x(m) , y (m) ) , x(i) ∈ Rd , y (i) ∈ {−1, 1}
Lucas Xu Introduction to Support Vector Machine September 4, 2012 3 / 20
4. Classifier
Umeng Gender Classification Data
user app1 app2 ··· appd gender
user1 1 0 ··· 0 male
user2 0 1 ··· 1 f emale
.
. .
. .
. .. .
. .
.
. . . . . .
usern 1 1 ··· 1 f emale
Each App belongs to one category, ≈ 20 categories.
Categories are mutual exclusive.
Lucas Xu Introduction to Support Vector Machine September 4, 2012 4 / 20
5. Classifier
Umeng Gender Classification Data
S = (x(1) , y (1) ), · · · , (x(m) , y (m) ) , x(i) ∈ Rd , y (i) ∈ {−1, 1}
(i)
xk ∈ {0, 1}, 0 means not installed, 1 means installed on the device
1 ≤ k ≤ d, d 30, 000, about 30,000 apps
y (i) ∈ {male, f emale}
Lucas Xu Introduction to Support Vector Machine September 4, 2012 5 / 20
6. Hyper-Plane
Figure : Hyper Plane
The hyper-plane: wT x + b = 0
Classification function: hw,b (x) = g(wT x + b)
1 if z ≥ 0
g(z) =
−1 otherwise
Lucas Xu Introduction to Support Vector Machine September 4, 2012 6 / 20
7. Hyper-Plane
Functional Margin:
γ (i) = y (i) (wT x(i) + b)
ˆ
Scaling: set constraint normalization condition : w = 1
Geometric Margin:
w T b
γ (i) = y (i) x(i) +
w w
γ (i) should be a large positive number to increase the prediction
confidence.
Lucas Xu Introduction to Support Vector Machine September 4, 2012 7 / 20
8. Hyper-Plane
Definition
The geometry margin of (w, b) with respect to training dataset S:
γ = min γ (i)
i=1,...,m
Lucas Xu Introduction to Support Vector Machine September 4, 2012 8 / 20
9. Hyper-Plane
The optimal margin classifier: (Intuitive)
find a decision boundary that maximizes the margin.
maxγ,w,b γ
s.t. y (i) (wT x(i) + b) ≥ γ, i = 1, ..., m
w = 1.
Lucas Xu Introduction to Support Vector Machine September 4, 2012 9 / 20
10. Hyper-Plane
Normalization Constraint: let function margin γ = 1
ˆ
⇓
1
maxγ,w,b
w
s.t. y (i) (wT x(i) + b) ≥ γ, i = 1, ..., m
⇓
1
maxw,b w 2
2
s.t. y (i) (wT x(i) + b) ≥ 1, i = 1, ..., m
Lucas Xu Introduction to Support Vector Machine September 4, 2012 10 / 20
11. Hyper-Plane
Convex function
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
12. Hyper-Plane
Convex function
Convex set
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
13. Hyper-Plane
Convex function
Convex set
So-called Quadratic Programming. Their are many software
packages to solve the problem.
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
14. Hyper-Plane
Convex function
Convex set
So-called Quadratic Programming. Their are many software
packages to solve the problem.
Basic Ideas for Support Vector Machine DONE !
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
15. Hyper-Plane
Convex function
Convex set
So-called Quadratic Programming. Their are many software
packages to solve the problem.
Basic Ideas for Support Vector Machine DONE !
More efficient solution ?
Lucas Xu Introduction to Support Vector Machine September 4, 2012 11 / 20
16. Convex Optimization
Primal Problem:
1
maxw,b w 2
2
s.t. y (i) (wT x(i) + b) ≥ 1, i = 1, ..., m
Lucas Xu Introduction to Support Vector Machine September 4, 2012 12 / 20
17. Convex Optimization
Lagrangian for the original problem:
m
1 2
min max L(w, b, α) = w − αi y (i) (wT x(i) + b) − 1
w,b α:αi ≥0 2
i=1
⇓
Under K.K.T condition, transforms to its Dual problem:
m m
1
max W (α) = αi − y (i) y (j) αi αj x(i) , x(j)
α 2
i=1 i,j=1
s.t. αi ≥ 0, i = 1, ..., m
m
αi y (i) = 0
i=1
Lucas Xu Introduction to Support Vector Machine September 4, 2012 13 / 20
18. Convex Optimization
Solutions:
m
∗
w = αi y (i) x(i)
i=1
maxi:y(i) =−1 w∗T x(i) + mini:y(i) =1 w∗T x(i)
b∗ = −
2
Predict:
g(x) = wT x + b
m T
= αi y (i) x(i) x+b
i=1
m
= αi y (i) x(i) , x + b
i=1
Lucas Xu Introduction to Support Vector Machine September 4, 2012 14 / 20
19. Kernel
For most of αi , αi = 0.
For those αi > 0, (x(i) , y (i) ) are called support vectors
Only needs to compute x(i) , x
(i) (i) (i)
if we can map feature space (x1 , x2 , ...xk ) to another high
(i) (i) (i)
dimension space (z1 , z2 , ...zl ), z = φ(x)
i.e. φ(x(i) , φ(x)
we can easily compute z (i) , z = K(φ( x(i) , x ))
Use a slightly different notation:
K(x, y) = φ(x), φ(y)
Intuitive Explanation: Measure of Similarities
Lucas Xu Introduction to Support Vector Machine September 4, 2012 15 / 20
20. Kernel
Definition
Mercer Kernel: K is positive semi-definite
Lucas Xu Introduction to Support Vector Machine September 4, 2012 16 / 20
21. Kernel
Primitive x, y
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
22. Kernel
Primitive x, y
Polynomial ( x, y + 1)d
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
23. Kernel
Primitive x, y
Polynomial ( x, y + 1)d
RBF exp(−γ||x − y||2 )
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
24. Kernel
Primitive x, y
Polynomial ( x, y + 1)d
RBF exp(−γ||x − y||2 )
Sigmoid tanh(κ x, y + c).
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
25. Kernel
Primitive x, y
Polynomial ( x, y + 1)d
RBF exp(−γ||x − y||2 )
String
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
26. Kernel
Primitive x, y
Polynomial ( x, y + 1)d
RBF exp(−γ||x − y||2 )
String
Tree
Lucas Xu Introduction to Support Vector Machine September 4, 2012 17 / 20
27. Apply to Umeng Gender Classification
Problem Description
Classify the gender of a user based on apps (s)he installed and
categories of apps.
Kernel Design
m
K(x, y) = φ(xi , yj )
i,j=0
(1 + w)xi yj if i = j
φ(xi , yj ) = xi yj if i = j but the same category
0 if not the same category
w ≥ 0 , the extra weight if two users have installed the same app.
default to 1.0
Experiment Result
Lucas Xu Introduction to Support Vector Machine September 4, 2012 18 / 20
28. Apply to Umeng Gender Classification
x1
x2
.
.
.
xm
⇓
w · x1
w · x2
.
.
.
w · xm
c1
c2
.
. .
c20
ci counts the number of apps belonging to category i
Lucas Xu Introduction to Support Vector Machine September 4, 2012 19 / 20
29. references
Book: Christopher Bishop – PRML Chapter 7: Section 7.1
Slides: Andrew Moore – Support Vector Machines
Video: Bernhard Scholkopf – Kernel Methods
Video: Liva Ralaivola – Introduction to Kernel Methods
Video: Colin Campbell – Introduction to Support Vector Machines
Video: Alex Smola – Kernel Methods and Support Vector
Machines
Video: Partha Niyogi – Introduction to Kernel Methods
Many more videos on kernel-related topics here
http://www.seas.harvard.edu/courses/cs281/
Lucas Xu Introduction to Support Vector Machine September 4, 2012 20 / 20