3. MissSVM Train
INPUT X1,y1( ), X2,y2( ),..., Xm,ym( ){ } where Xi = xi1,xi2,...,xini
{ } yi = +1 if ∃xik ∈Xi → xik = 1
1:REORDER bag list
X1
−
, X2
−
,..., Xq
−
, Xq+1
+
,..., Xq+p−1
+
, Xm
+
{ }
2:MAP into instance list
Lx = x1,1,...,x1,n1
,x2,1,...,xm,1,...,xm,nm
{ }
Given a set of labeled negative instances x1,−1( ), x2,−1( ),..., xTL
,−1( ){ }
of unlabeled instances
and a set
xTL +1,...,xT{ } , to learn a function Fs
: x → {−1, +1}
Subject to: For i = q +1, ... ,m at least one instance in
{xsi
, ... ,xei
} is +1
total number of instances in training setT
TL
total number of instances labeled with -1
q total number of negative bags
m total number of bags
si
index in Lx of the first instance belonging to ith bag
index in Lx of the last instance belonging to ith bagei
3:Learn a Semi-Supervised SVM
-1 otherwiseλ, γ , δ
4. X = x1,x2,...,xn{ }
+1 if ∃xi ∈X, s.t. Fs
xi( )= +1
INPUT
RETURN
OUTPUT y ∈ −1, 1{ }
−1 otherwise
MissSVM Predict
5. Fs
Learn a Semi Supervised SVM
Optimization problem for popular semi-supervised SVM
min
f
1
2
|| f ||H
2
+λ H1 yt f xt( )( )
t=1
TL
∑ +δ D f xt( )( )
t=TL +1
T
∑
f : x → R
|| f ||H norm of Reproducing Kernel Hilbert Space of f
H1 z( )= max 0, 1− z{ } Hinge Loss
D z( )= min H1 z( ), H1 −z( ){ } A non-convex hat shape loss functionD z( )= min H1 z( ), H1 −z( ){ }
1.Positive Constrains
2.Dimension of space
For i = q +1, ... ,m , at least one instance in {xsi
, ... ,xei
} is +1.
To reduce the optimization problem from a possibly infinite-dimensional
space to a finite-dimensional space
Considerations:
6. Modified to be a CCCP(Constrained Concave-Convex Procedure)
min
α,η,θ,ε,b
1
2
α 'Kα + λη'1+γθ '1+δ min ε,ξ( )'1
s.t.
(−1)(k't α + b)+ηt ≥1, ηt ≥ 0, t = 1,2,...,TL;
max
t=si ,...,ei
(k't α + b)+θi−q ≥1,θi−q ≥ 0, i = q +1,...,m;
(k't α + b)+ εt−TL
≥1,εt−TL
≥ 0, t = TL +1,...,T;
(−1)(k't α + b)+ξt−TL
≥1,ξt−TL
≥ 0, t = TL +1,...,T.
⎧
⎨
⎪
⎪⎪
⎩
⎪
⎪
⎪
η = η1,...,ηTL
⎡⎣ ⎤⎦' slack variables for the error on instances of negative bags
θ = θ1,...,θp
⎡⎣ ⎤⎦' slack variables for the error on positive bags
ε = ε1,...,εTU
⎡⎣ ⎤⎦'ξ = ξ1,...,ξTU
⎡⎣ ⎤⎦' slack variables for the error on instances of positive bags
λ,γ ,δ user defined parameters to trade off complexity with errors
K a TxT kernel matrix
Fs
Learn a Semi Supervised SVM
7. Revised to be a standard QP
min
α,η,θ,ε,b
1
2
α 'Kα + λη'1+γθ '1+δ ∂ min εa
,ξa
( )'1( )(
ε
ξ
)
s.t.
(−1)(k't α + b)+ηt ≥1, ηt ≥ 0, t = 1,2,...,TL;
βit
a
t=si
ei
∑ k't α + b +θi−q ≥1,θi−q ≥ 0, i = q +1,...,m;
(k't α + b)+ εt−TL
≥1,εt−TL
≥ 0, t = TL +1,...,T;
(−1)(k't α + b)+ξt−TL
≥1,ξt−TL
≥ 0, t = TL +1,...,T.
⎧
⎨
⎪
⎪
⎪
⎩
⎪
⎪
⎪
Fs
Learn a Semi Supervised SVM
βit =
= 0, if kt 'α ≠ max
r=si ,...,ei
kr 'α
=
1
na
,otherwise
⎧
⎨
⎪⎪
⎩
⎪
⎪
∂ max
t=si ,...,ei
kt 'α( )= βit kt '
t=s i
ei
∑
8. Dataset: Musk
Data generated in the research of drug activity prediction
Algorithm Musk1 Musk2
MissSVM 87.6 82.3
mi-SVM 87.4 83.6
MI-SVM 77.9 84.3
Diverse Density 88.9 82.5
Table 1. Predictive accuracy(%) on the Musk data