SlideShare uma empresa Scribd logo
1 de 16
Support Vector Machines for Regression
July 15, 2015
1 / 16
Overview
1 Linear Regression
2 Non-linear Regression and Kernels
2 / 16
Linear Regression Model
The linear regression model
f(x) = xT
β + β0
To estimate β, we consider minimization of
H(β, β0) =
N
i=1
V (yi − f(xi)) +
λ
2
β 2
with a loss function V and a regularization λ
2 β 2
• How to apply SVM to solve the linear regression problem?
3 / 16
Linear Regression Model (Cont)
The basic idea:
Given training data set (x1, y1), ..., (xN , yN )
Target: find a function f(x) that has at most deviation from targets
yi for all the training data and at the same time is as less complex
(flat) as possible.
In other words we do not care about errors as long as they are less
than but will not accept any deviation larger than this.
4 / 16
Linear Regression Model (Cont)
• We want to find one ” -tube” that can contains all the samples.
• Intuitively, a tube, with a small width, seems to over-fit with the training
data.
We should find f(x) that its -tube’s width is as big as possible (more
generalization capability, less prediction error in future).
• With a defined , a bigger tube
corresponds to a smaller β
(flatter function).
• Optimization problem:
minimize
1
2
β 2
s.t
yi − f(xi) ≤
f(xi) − yi ≤
5 / 16
Linear Regression Model (Cont)
With a defined , this problem is not always feasible, so we also want to
allow some errors.
Use slack variables ξi, ξ∗
i , the
new optimization problem:
minimize
1
2
β 2
+ C
N
i=1
(ξi + ξ∗
i )
s.t



yi − f(xi) ≤ + ξ∗
i
f(xi) − yi ≤ + ξi
ξi, ξ∗
i ≥ 0
6 / 16
Linear Regression Model (Cont)
Let λ = 1
C
Use an ” -insensitive” error measure,
ignoring errors of size less than
V (r) =
0 if |r| <
|r| − , otherwise.
We have the minimization of
H(β, β0) =
N
i=1
V (yi − f(xi)) +
λ
2
β 2
7 / 16
Linear Regression Model (Cont)
The Lagrange (primal) function:
LP =
1
2
β 2
+ C
N
i=1
(ξ∗
i + ξi) −
N
i=1
α∗
i ( + ξ∗
i − yi + xT
i β + β0)
−
N
i=1
αi(ε + ξi + yi − xT
i β − β0) −
N
i=1
(η∗
i ξ∗
i + ηiξi)
which we minimize w.r.t β, β0, ξi, ξ∗
i . Setting the respective derivatives to
0, we get
0 =
N
i=1
(α∗
i − αi)
β =
N
i=1
(α∗
i − αi)xi
α
(∗)
i = C − η
(∗)
i , ∀i
8 / 16
Linear Regression Model (Cont)
Substitute to the primal function, we obtain the dual optimization problem:
max
αi,α∗
i
−
N
i=1
(α∗
i +αi)+
N
i=1
yi(α∗
i −αi)−
1
2
N
i,i =1
(α∗
i −αi)(α∗
i −αi ) xi, xi
s.t



0 ≤ αi, α∗
i ≤ C(= 1/λ)
N
i=1(α∗
i − αi) = 0
αiα∗
i = 0
The solution function has the form
ˆβ =
N
i=1
(ˆα∗
i − ˆαi)xi
ˆf(x) =
N
i=1
(ˆα∗
i − ˆαi) x, xi + β0
9 / 16
Linear Regression Model (Cont)
Follow KKT conditions, we have
ˆα∗
i ( + ξ∗
i − yi + ˆf(xi)) = 0
ˆαi( + ξi + yi − ˆf(xi)) = 0
(C − ˆα∗
i )ˆξ∗
i = 0
(C − ˆαi)ˆξi = 0
→ For all data points inside the -tube, ˆαi = ˆα∗
i = 0. Only data points
outside may have (ˆα∗
i − ˆαi) = 0.
→ Do not need all xi to describe β. The associated data points are called
the support vectors.
10 / 16
Linear Regression Model (Cont)
Parameter controls the width of the -insensitive tube. The value of
can affect the number of support vectors used to construct the
regression function. The bigger , the fewer support vectors are
selected, the ”flatter” estimates.
It is associated with the choice of the loss function ( -insensitive loss
function, quadratic loss function or Huber loss function, etc.)
Parameter C (1
λ) determines the trade off between the model
complexity (flatness) and the degree to which deviations larger than
are tolerated.
It is interpreted as a traditional regularization parameter that can be
estimated by cross-validation for example
11 / 16
Non-linear Regression and Kernels
When the data is non-linear, use a map ϕ to transform the data into a
higher dimensional feature space to make it possible to perform the linear
regression.
12 / 16
Non-linear Regression and Kernels (Cont)
Suppose we consider approximation of the regression function in term of a
set of basis function {hm(x)}, m = 1, 2, ..., M:
f(x) =
M
m=1
βmhm(x) + β0
To estimate β and β0, minimize
H(β, β0) =
N
i=1
V (yi − f(xi)) +
λ
2
β2
m
for some general error measure V (r). The solution has the form
ˆf(x) =
N
i=1
ˆαiK(x, xi)
with K(x, x ) = M
m=1 hm(x)hm(x )
13 / 16
Non-linear Regression and Kernels (Cont)
Let work out with V (r) = r2. Let H be the N x M basis matrix with imth
element hm(xi) For simplicity assume β0 = 0. Estimate β by minimize
H(β) = (y − Hβ)T
(y − Hβ) + λ β 2
Setting the first derivative to zero, we have the solution ˆy = Hˆβ with ˆβ
determined by
−2HT
(y − Hˆβ) + 2λˆβ = 0
−HT
(y − Hˆβ) + λˆβ = 0
−HHT
(y − Hˆβ) + λHˆβ = 0 (premultiply by H)
(HHT
+ λI)Hˆβ = HHT
y
Hˆβ = (HHT
+ λI)−1
HHT
y
14 / 16
Non-linear Regression and Kernels (Cont)
We have estimate function:
f(x) = h(x)T ˆβ
= h(x)T
HT
(HHT
)−1
Hˆβ
= h(x)T
HT
(HHT
)−1
(HHT
+ λI)−1
HHT
y
= h(x)T
HT
[(HHT
+ λI)(HHT
)]−1
HHT
y
= h(x)T
HT
[(HHT
)(HHT
) + λ(HHT
)I]−1
HHT
y
= h(x)T
HT
[(HHT
)(HHT
+ λI)]−1
HHT
y
= h(x)T
HT
(HHT
+ λI)−1
(HHT
)−1
HHT
y
= h(x)T
HT
(HHT
+ λI)−1
y
= [K(x, x1)K(x, x2)...K(x, xN )]ˆα
=
N
i=1
ˆαiK(x, xi)
where ˆα = (HHT
+ λI)−1y. 15 / 16
• The matrix N x N HHT
consists of inner products between pair of
observation i, i . {HHT
}i,i = K(xi, xi )
→ Need not specify or evaluate the large set of functions
h1(x), h2(x), ..., hM (x).
Only the inner product kernel K(xi, xi ) need be evaluated, at the N
training points and at points x for predictions there.
• Some popular choices of K are
dth-Degree polynomial: K(x, x ) = (1 + x, x )d
Radial basis: K(x, x ) = exp(−γ x − x 2)
Neural network: K(x, x ) = tanh(κ1 x, x + κ2)
• This property depends on the choice of squared norm β 2
16 / 16

Mais conteúdo relacionado

Mais procurados

Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysisPramod Toraskar
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear RegressionAndrew Ferlitsch
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear RegressionAndrew Ferlitsch
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization일상 온
 
polynomial linear regression
polynomial linear regressionpolynomial linear regression
polynomial linear regressionAkhilesh Joshi
 
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...Edureka!
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningFrancesco Casalegno
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descentkandelin
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
Installing Anaconda Distribution of Python
Installing Anaconda Distribution of PythonInstalling Anaconda Distribution of Python
Installing Anaconda Distribution of PythonJatin Miglani
 
Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningPaxcel Technologies
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineMusa Hawamdah
 

Mais procurados (20)

Ridge regression
Ridge regressionRidge regression
Ridge regression
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear Regression
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization
 
polynomial linear regression
polynomial linear regressionpolynomial linear regression
polynomial linear regression
 
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine Learning
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Installing Anaconda Distribution of Python
Installing Anaconda Distribution of PythonInstalling Anaconda Distribution of Python
Installing Anaconda Distribution of Python
 
Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine Learning
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Decision tree
Decision treeDecision tree
Decision tree
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 

Semelhante a SVM for Regression

lecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptlecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptNaglaaAbdelhady
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.pptMahimMajee
 
lecture14-SVMs (1).ppt
lecture14-SVMs (1).pptlecture14-SVMs (1).ppt
lecture14-SVMs (1).pptmuqadsatareen
 
Support vector machine in data mining.pdf
Support vector machine in data mining.pdfSupport vector machine in data mining.pdf
Support vector machine in data mining.pdfRubhithaA
 
High-Performance Haskell
High-Performance HaskellHigh-Performance Haskell
High-Performance HaskellJohan Tibell
 
Linear Programming
Linear ProgrammingLinear Programming
Linear Programmingknspavan
 
Linear Machine Learning Models with L2 Regularization and Kernel Tricks
Linear Machine Learning Models with L2 Regularization and Kernel TricksLinear Machine Learning Models with L2 Regularization and Kernel Tricks
Linear Machine Learning Models with L2 Regularization and Kernel TricksFengtao Wu
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesUjjawal
 
Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Derbew Tesfa
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Asma Ben Slimene
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Asma Ben Slimene
 

Semelhante a SVM for Regression (20)

lecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptlecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.ppt
 
smtlecture.6
smtlecture.6smtlecture.6
smtlecture.6
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.ppt
 
lecture14-SVMs (1).ppt
lecture14-SVMs (1).pptlecture14-SVMs (1).ppt
lecture14-SVMs (1).ppt
 
Support vector machine in data mining.pdf
Support vector machine in data mining.pdfSupport vector machine in data mining.pdf
Support vector machine in data mining.pdf
 
Gentle intro to SVM
Gentle intro to SVMGentle intro to SVM
Gentle intro to SVM
 
High-Performance Haskell
High-Performance HaskellHigh-Performance Haskell
High-Performance Haskell
 
Linear Programming
Linear ProgrammingLinear Programming
Linear Programming
 
cswiercz-general-presentation
cswiercz-general-presentationcswiercz-general-presentation
cswiercz-general-presentation
 
Computer Network Homework Help
Computer Network Homework HelpComputer Network Homework Help
Computer Network Homework Help
 
Linear Machine Learning Models with L2 Regularization and Kernel Tricks
Linear Machine Learning Models with L2 Regularization and Kernel TricksLinear Machine Learning Models with L2 Regularization and Kernel Tricks
Linear Machine Learning Models with L2 Regularization and Kernel Tricks
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Numerical Computation
Numerical ComputationNumerical Computation
Numerical Computation
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research
 
1551 limits and continuity
1551 limits and continuity1551 limits and continuity
1551 limits and continuity
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 

Último

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Último (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

SVM for Regression

  • 1. Support Vector Machines for Regression July 15, 2015 1 / 16
  • 2. Overview 1 Linear Regression 2 Non-linear Regression and Kernels 2 / 16
  • 3. Linear Regression Model The linear regression model f(x) = xT β + β0 To estimate β, we consider minimization of H(β, β0) = N i=1 V (yi − f(xi)) + λ 2 β 2 with a loss function V and a regularization λ 2 β 2 • How to apply SVM to solve the linear regression problem? 3 / 16
  • 4. Linear Regression Model (Cont) The basic idea: Given training data set (x1, y1), ..., (xN , yN ) Target: find a function f(x) that has at most deviation from targets yi for all the training data and at the same time is as less complex (flat) as possible. In other words we do not care about errors as long as they are less than but will not accept any deviation larger than this. 4 / 16
  • 5. Linear Regression Model (Cont) • We want to find one ” -tube” that can contains all the samples. • Intuitively, a tube, with a small width, seems to over-fit with the training data. We should find f(x) that its -tube’s width is as big as possible (more generalization capability, less prediction error in future). • With a defined , a bigger tube corresponds to a smaller β (flatter function). • Optimization problem: minimize 1 2 β 2 s.t yi − f(xi) ≤ f(xi) − yi ≤ 5 / 16
  • 6. Linear Regression Model (Cont) With a defined , this problem is not always feasible, so we also want to allow some errors. Use slack variables ξi, ξ∗ i , the new optimization problem: minimize 1 2 β 2 + C N i=1 (ξi + ξ∗ i ) s.t    yi − f(xi) ≤ + ξ∗ i f(xi) − yi ≤ + ξi ξi, ξ∗ i ≥ 0 6 / 16
  • 7. Linear Regression Model (Cont) Let λ = 1 C Use an ” -insensitive” error measure, ignoring errors of size less than V (r) = 0 if |r| < |r| − , otherwise. We have the minimization of H(β, β0) = N i=1 V (yi − f(xi)) + λ 2 β 2 7 / 16
  • 8. Linear Regression Model (Cont) The Lagrange (primal) function: LP = 1 2 β 2 + C N i=1 (ξ∗ i + ξi) − N i=1 α∗ i ( + ξ∗ i − yi + xT i β + β0) − N i=1 αi(ε + ξi + yi − xT i β − β0) − N i=1 (η∗ i ξ∗ i + ηiξi) which we minimize w.r.t β, β0, ξi, ξ∗ i . Setting the respective derivatives to 0, we get 0 = N i=1 (α∗ i − αi) β = N i=1 (α∗ i − αi)xi α (∗) i = C − η (∗) i , ∀i 8 / 16
  • 9. Linear Regression Model (Cont) Substitute to the primal function, we obtain the dual optimization problem: max αi,α∗ i − N i=1 (α∗ i +αi)+ N i=1 yi(α∗ i −αi)− 1 2 N i,i =1 (α∗ i −αi)(α∗ i −αi ) xi, xi s.t    0 ≤ αi, α∗ i ≤ C(= 1/λ) N i=1(α∗ i − αi) = 0 αiα∗ i = 0 The solution function has the form ˆβ = N i=1 (ˆα∗ i − ˆαi)xi ˆf(x) = N i=1 (ˆα∗ i − ˆαi) x, xi + β0 9 / 16
  • 10. Linear Regression Model (Cont) Follow KKT conditions, we have ˆα∗ i ( + ξ∗ i − yi + ˆf(xi)) = 0 ˆαi( + ξi + yi − ˆf(xi)) = 0 (C − ˆα∗ i )ˆξ∗ i = 0 (C − ˆαi)ˆξi = 0 → For all data points inside the -tube, ˆαi = ˆα∗ i = 0. Only data points outside may have (ˆα∗ i − ˆαi) = 0. → Do not need all xi to describe β. The associated data points are called the support vectors. 10 / 16
  • 11. Linear Regression Model (Cont) Parameter controls the width of the -insensitive tube. The value of can affect the number of support vectors used to construct the regression function. The bigger , the fewer support vectors are selected, the ”flatter” estimates. It is associated with the choice of the loss function ( -insensitive loss function, quadratic loss function or Huber loss function, etc.) Parameter C (1 λ) determines the trade off between the model complexity (flatness) and the degree to which deviations larger than are tolerated. It is interpreted as a traditional regularization parameter that can be estimated by cross-validation for example 11 / 16
  • 12. Non-linear Regression and Kernels When the data is non-linear, use a map ϕ to transform the data into a higher dimensional feature space to make it possible to perform the linear regression. 12 / 16
  • 13. Non-linear Regression and Kernels (Cont) Suppose we consider approximation of the regression function in term of a set of basis function {hm(x)}, m = 1, 2, ..., M: f(x) = M m=1 βmhm(x) + β0 To estimate β and β0, minimize H(β, β0) = N i=1 V (yi − f(xi)) + λ 2 β2 m for some general error measure V (r). The solution has the form ˆf(x) = N i=1 ˆαiK(x, xi) with K(x, x ) = M m=1 hm(x)hm(x ) 13 / 16
  • 14. Non-linear Regression and Kernels (Cont) Let work out with V (r) = r2. Let H be the N x M basis matrix with imth element hm(xi) For simplicity assume β0 = 0. Estimate β by minimize H(β) = (y − Hβ)T (y − Hβ) + λ β 2 Setting the first derivative to zero, we have the solution ˆy = Hˆβ with ˆβ determined by −2HT (y − Hˆβ) + 2λˆβ = 0 −HT (y − Hˆβ) + λˆβ = 0 −HHT (y − Hˆβ) + λHˆβ = 0 (premultiply by H) (HHT + λI)Hˆβ = HHT y Hˆβ = (HHT + λI)−1 HHT y 14 / 16
  • 15. Non-linear Regression and Kernels (Cont) We have estimate function: f(x) = h(x)T ˆβ = h(x)T HT (HHT )−1 Hˆβ = h(x)T HT (HHT )−1 (HHT + λI)−1 HHT y = h(x)T HT [(HHT + λI)(HHT )]−1 HHT y = h(x)T HT [(HHT )(HHT ) + λ(HHT )I]−1 HHT y = h(x)T HT [(HHT )(HHT + λI)]−1 HHT y = h(x)T HT (HHT + λI)−1 (HHT )−1 HHT y = h(x)T HT (HHT + λI)−1 y = [K(x, x1)K(x, x2)...K(x, xN )]ˆα = N i=1 ˆαiK(x, xi) where ˆα = (HHT + λI)−1y. 15 / 16
  • 16. • The matrix N x N HHT consists of inner products between pair of observation i, i . {HHT }i,i = K(xi, xi ) → Need not specify or evaluate the large set of functions h1(x), h2(x), ..., hM (x). Only the inner product kernel K(xi, xi ) need be evaluated, at the N training points and at points x for predictions there. • Some popular choices of K are dth-Degree polynomial: K(x, x ) = (1 + x, x )d Radial basis: K(x, x ) = exp(−γ x − x 2) Neural network: K(x, x ) = tanh(κ1 x, x + κ2) • This property depends on the choice of squared norm β 2 16 / 16