SlideShare uma empresa Scribd logo
1 de 74
Regularization and Feature
Selection
Legal Notices and Disclaimers
This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES,
EXPRESS OR IMPLIED, IN THIS SUMMARY.
Intel technologies’ features and benefits depend on system configuration and may require
enabled hardware, software or service activation. Performance varies depending on system
configuration. Check with your system manufacturer or retailer or learn more at intel.com.
This sample source code is released under the Intel Sample Source Code License Agreement.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2017, Intel Corporation. All rights reserved.
Model Complexity vs Error
error
𝐽𝑐𝑣 𝜃
cross validation error
𝐽𝑡𝑟𝑎𝑖𝑛 𝜃
training error
Preventing Under- and Overfitting
• How to use a degree 9 polynomial and prevent overfitting?
X
Y
Model
True Function
Samples
X
Y
X
Y
Polynomial Degree = 1 Polynomial Degree = 3 Polynomial Degree = 9
Preventing Under- and Overfitting
𝐽 𝛽0, 𝛽1 =
1
2𝑚
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
X
Y
Model
True Function
Samples
X
Y
X
Y
Polynomial Degree = 1 Polynomial Degree = 3 Polynomial Degree = 9
Regularization
𝐽 𝛽0, 𝛽1 =
1
2𝑚
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
+ 𝜆
𝑗=1
𝑘
𝛽𝑗
2
X
Y
Model
True Function
Samples
X
Y
X
Y
Poly Degree=9, 𝜆=0.1Poly Degree=9, 𝜆=1e-5Poly Degree=9, 𝜆=0.0
𝐽 𝛽0, 𝛽1 =
1
2𝑚
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
+ 𝜆
𝑗=1
𝑘
𝛽𝑗
2
Ridge Regression (L2)
• Penalty shrinks magnitude of all coefficients
• Larger coefficients strongly penalized because
of the squaring
Effect of Ridge Regression on Parameters
𝐽 𝛽0, 𝛽1 =
1
2𝑚
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
+ 𝜆
𝑗=1
𝑘
𝛽𝑗
2
X
Model
True Function
Samples
XX
Y
Poly=9, 𝜆=0.1Poly=9, 𝜆=1e-5Poly=9, 𝜆=0.0
1
abs(coefficient)
2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
100
102
104
106
108
𝐽 𝛽0, 𝛽1 =
1
2𝑚
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
+ 𝜆
𝑗=1
𝑘
𝛽𝑗
Lasso Regression (L1)
• Penalty selectively shrinks some coefficients
• Can be used for feature selection
• Slower to converge than Ridge regression
Effect of Lasso Regression on Parameters
𝐽 𝛽0, 𝛽1 =
1
2𝑚
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
+ 𝜆
𝑗=1
𝑘
𝛽𝑗
X
Model
True Function
Samples
XX
Y
Poly=9, 𝜆=0.1Poly=9, 𝜆=1e-5Poly=9, 𝜆=0.0
1
abs(coefficient)
2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
100
102
104
106
108
𝐽 𝛽0, 𝛽1 =
1
2𝑚
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
+ 𝜆1
𝑗=1
𝑘
𝛽𝑗 + 𝜆2
𝑗=1
𝑘
𝛽𝑗
2
Elastic Net Regularization
• Compromise of both Ridge and Lasso regression
• Requires tuning of additional parameter that
distributes regularization penalty between L1 and L2
𝐽 𝛽0, 𝛽1 =
1
2𝑚
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
+ 𝜆1
𝑗=1
𝑘
𝛽𝑗 + 𝜆2
𝑗=1
𝑘
𝛽𝑗
2
X
Y
Model
True Function
Samples
X
Y
X
Y
Poly=9, 𝜆1=𝜆2=0.1Poly=9, 𝜆1=𝜆2=1e-5Poly=9, 𝜆1=𝜆2=0.0
Elastic Net Regularization
Hyperparameters and Their Optimization
• Regularization coefficients (𝜆1 and 𝜆2)
are empirically determined
• Want value that generalizes—do not
use test data for tuning
• Create additional split of data to tune
hyperparameters—validation set
• Cross validation can also be used on
training data
Training Data
Test Data
Use Test Data to Tune 𝜆?
Hyperparameters and Their Optimization
• Regularization coefficients (𝜆1 and 𝜆2)
are empirically determined
• Want value that generalizes—do not
use test data for tuning
• Create additional split of data to tune
hyperparameters—validation set
• Cross validation can also be used on
training data
Training Data
Test Data
NO!
Use Test Data to Tune 𝜆?
Hyperparameters and Their Optimization
• Regularization coefficients (𝜆1 and 𝜆2)
are empirically determined
• Want value that generalizes—do not
use test data for tuning
• Create additional split of data to tune
hyperparameters—validation set
• Cross validation can also be used on
training data
Training Data
Test Data
Tune 𝜆 with Cross Validation
Validation
Data
Import the class containing the regression method
from sklearn.linear_model import Ridge
Create an instance of the class
RR = Ridge(alpha=1.0)
Fit the instance on the data and then predict the expected value
RR = RR.fit(X_train, y_train)
y_predict = RR.predict(X_test)
The RidgeCV class will perform cross validation on a set of values for
alpha.
Ridge Regression: The Syntax
Import the class containing the regression method
from sklearn.linear_model import Ridge
Create an instance of the class
RR = Ridge(alpha=1.0)
Fit the instance on the data and then predict the expected value
RR = RR.fit(X_train, y_train)
y_predict = RR.predict(X_test)
The RidgeCV class will perform cross validation on a set of values for
alpha.
Ridge Regression: The Syntax
Import the class containing the regression method
from sklearn.linear_model import Ridge
Create an instance of the class
RR = Ridge(alpha=1.0)
Fit the instance on the data and then predict the expected value
RR = RR.fit(X_train, y_train)
y_predict = RR.predict(X_test)
The RidgeCV class will perform cross validation on a set of values for
alpha.
Ridge Regression: The Syntax
regularization
parameter
Import the class containing the regression method
from sklearn.linear_model import Ridge
Create an instance of the class
RR = Ridge(alpha=1.0)
Fit the instance on the data and then predict the expected value
RR = RR.fit(X_train, y_train)
y_predict = RR.predict(X_test)
The RidgeCV class will perform cross validation on a set of values for
alpha.
Ridge Regression: The Syntax
Ridge Regression: The Syntax
Import the class containing the regression method
from sklearn.linear_model import Ridge
Create an instance of the class
RR = Ridge(alpha=1.0)
Fit the instance on the data and then predict the expected value
RR = RR.fit(X_train, y_train)
y_predict = RR.predict(X_test)
The RidgeCV class will perform cross validation on a set of values for
alpha.
Lasso Regression: The Syntax
Import the class containing the regression method
from sklearn.linear_model import Lasso
Create an instance of the class
LR = Lasso(alpha=1.0)
Fit the instance on the data and then predict the expected value
LR = LR.fit(X_train, y_train)
y_predict = LR.predict(X_test)
The LassoCV class will perform cross validation on a set of values for
alpha.
Lasso Regression: The Syntax
Import the class containing the regression method
from sklearn.linear_model import Lasso
Create an instance of the class
LR = Lasso(alpha=1.0)
Fit the instance on the data and then predict the expected value
LR = LR.fit(X_train, y_train)
y_predict = LR.predict(X_test)
The LassoCV class will perform cross validation on a set of values for
alpha.
regularization
parameter
Elastic Net Regression: The Syntax
Import the class containing the regression method
from sklearn.linear_model import ElasticNet
Create an instance of the class
EN = ElasticNet(alpha=1.0, l1_ratio=0.5)
Fit the instance on the data and then predict the expected value
EN = EN.fit(X_train, y_train)
y_predict = EN.predict(X_test)
The ElasticNetCV class will perform cross validation on a set of values for
l1_ratio and alpha.
Elastic Net Regression: The Syntax
alpha is the
regularization
parameter
Import the class containing the regression method
from sklearn.linear_model import ElasticNet
Create an instance of the class
EN = ElasticNet(alpha=1.0, l1_ratio=0.5)
Fit the instance on the data and then predict the expected value
EN = EN.fit(X_train, y_train)
y_predict = EN.predict(X_test)
The ElasticNetCV class will perform cross validation on a set of values for
l1_ratio and alpha.
Elastic Net Regression: The Syntax
l1_ratio
distributes alpha
to L1/L2
Import the class containing the regression method
from sklearn.linear_model import ElasticNet
Create an instance of the class
EN = ElasticNet(alpha=1.0, l1_ratio=0.5)
Fit the instance on the data and then predict the expected value
EN = EN.fit(X_train, y_train)
y_predict = EN.predict(X_test)
The ElasticNetCV class will perform cross validation on a set of values for
l1_ratio and alpha.
Feature Selection
• Regularization performs feature selection by shrinking
the contribution of features
• For L1-regularization, this is accomplished by driving
some coefficients to zero
• Feature selection can also be performed by removing
features
Feature Selection
• Regularization performs feature selection by shrinking
the contribution of features
• For L1-regularization, this is accomplished by driving
some coefficients to zero
• Feature selection can also be performed by removing
features
Feature Selection
• Regularization performs feature selection by shrinking
the contribution of features
• For L1-regularization, this is accomplished by driving
some coefficients to zero
• Feature selection can also be performed by removing
features
Why is Feature Selection Important?
• Reducing the number of features is another way to
prevent overfitting (similar to regularization)
• For some models, fewer features can improve fitting
time and/or results
• Identifying most critical features can improve model
interpretability
Why is Feature Selection Important?
• Reducing the number of features is another way to
prevent overfitting (similar to regularization)
• For some models, fewer features can improve fitting
time and/or results
• Identifying most critical features can improve model
interpretability
Why is Feature Selection Important?
• Reducing the number of features is another way to
prevent overfitting (similar to regularization)
• For some models, fewer features can improve fitting
time and/or results
• Identifying most critical features can improve model
interpretability
Recursive Feature Elimination: The Syntax
Import the class containing the feature selection method
from sklearn.feature_selection import RFE
Create an instance of the class
rfeMod = RFE(est, n_features_to_select=5)
Fit the instance on the data and then predict the expected value
rfeMod = rfeMod.fit(X_train, y_train)
y_predict = rfeMod.predict(X_test)
The RFECV class will perform feature elimination using cross validation.
Recursive Feature Elimination: The Syntax
est is an
instance of the
model to use
Import the class containing the feature selection method
from sklearn.feature_selection import RFE
Create an instance of the class
rfeMod = RFE(est, n_features_to_select=5)
Fit the instance on the data and then predict the expected value
rfeMod = rfeMod.fit(X_train, y_train)
y_predict = rfeMod.predict(X_test)
The RFECV class will perform feature elimination using cross validation.
Recursive Feature Elimination: The Syntax
Import the class containing the feature selection method
from sklearn.feature_selection import RFE
Create an instance of the class
rfeMod = RFE(est, n_features_to_select=5)
Fit the instance on the data and then predict the expected value
rfeMod = rfeMod.fit(X_train, y_train)
y_predict = rfeMod.predict(X_test)
The RFECV class will perform feature elimination using cross validation.
final number of
features
Gradient Descent
𝐽 𝛽
Gradient Descent
𝛽
Start with a cost function J(𝛽):
Then gradually move towards the minimum.
𝐽 𝛽
Gradient Descent
𝛽
Start with a cost function J(𝛽):
Global Minimum
Gradient Descent with Linear Regression
• Now imagine there are two
parameters (𝛽0, 𝛽1)
• This is a more complicated surface
on which the minimum must be
found
• How can we do this without knowing
what 𝐽 𝛽0, 𝛽1 looks like?
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Gradient Descent with Linear Regression
• Now imagine there are two
parameters (𝛽0, 𝛽1)
• This is a more complicated surface
on which the minimum must be
found
• How can we do this without knowing
what 𝐽 𝛽0, 𝛽1 looks like?
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Gradient Descent with Linear Regression
• Now imagine there are two
parameters (𝛽0, 𝛽1)
• This is a more complicated surface
on which the minimum must be
found
• How can we do this without knowing
what 𝐽 𝛽0, 𝛽1 looks like?
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
• Compute the gradient, 𝛻𝐽 𝛽0, 𝛽1 , which
points in the direction of the biggest
increase!
• -𝛻𝐽 𝛽0, 𝛽1 (negative gradient) points to
the biggest decrease at that point!
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Gradient Descent with Linear Regression
• The gradient is the a vector whose
coordinates consist of the partial
derivatives of the parameters
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Gradient Descent with Linear Regression
𝛻𝐽 𝛽0, … , 𝛽 𝑛 = <
𝜕𝐽
𝜕𝛽0
, … ,
𝜕𝐽
𝜕𝛽 𝑛
>
• Then use the gradient (𝛻) and the cost
function to calculate the next point (𝜔1)
from the current one (𝜔0):
• The learning rate (𝛼) is a tunable
parameter that determines step size
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Gradient Descent with Linear Regression
𝜔1 = 𝜔0 − 𝛼𝛻
1
2
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
𝜔0
𝜔1
• Then use the gradient (𝛻) and the cost
function to calculate the next point (𝜔1)
from the current one (𝜔0):
• The learning rate (𝛼) is a tunable
parameter that determines step size
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Gradient Descent with Linear Regression
𝜔1 = 𝜔0 − 𝛼𝛻
1
2
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
𝜔0
𝜔1
• Each point can be iteratively calculated
from the previous one
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Gradient Descent with Linear Regression
𝜔2 = 𝜔1 − 𝛼𝛻
1
2
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2 𝜔0
𝜔1
𝜔2
𝜔3 = 𝜔2 − 𝛼𝛻
1
2
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
• Each point can be iteratively calculated
from the previous one
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Gradient Descent with Linear Regression
𝜔2 = 𝜔1 − 𝛼𝛻
1
2
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2 𝜔0
𝜔1
𝜔2
𝜔3 = 𝜔2 − 𝛼𝛻
1
2
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2 𝜔3
• Use a single data point to determine the
gradient and cost function instead of all
the data
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Stochastic Gradient Descent
𝜔1 = 𝜔0 − 𝛼𝛻
1
2
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(0)
− 𝑦𝑜𝑏𝑠
(0)
2
𝜔0
𝜔1
𝜔1 = 𝜔0 − 𝛼𝛻
1
2
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
• Use a single data point to determine the
gradient and cost function instead of all
the data
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Stochastic Gradient Descent
𝜔0
𝜔1
𝜔1 = 𝜔0 − 𝛼𝛻
1
2
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
• Use a single data point to determine the
gradient and cost function instead of all
the data
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Stochastic Gradient Descent
𝜔1 = 𝜔0 − 𝛼𝛻
1
2
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(0)
− 𝑦𝑜𝑏𝑠
(0)
2
𝜔0
𝜔1
𝜔1 = 𝜔0 − 𝛼𝛻
1
2
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
𝜔4 = 𝜔3 − 𝛼𝛻
1
2
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(3)
− 𝑦𝑜𝑏𝑠
(3)
2
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Stochastic Gradient Descent
𝜔0
𝜔1
𝜔2 𝜔3 𝜔4
• Use a single data point to determine the
gradient and cost function instead of all
the data
𝜔1 = 𝜔0 − 𝛼𝛻
1
2
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(0)
− 𝑦𝑜𝑏𝑠
(0)
2
…
𝜔4 = 𝜔3 − 𝛼𝛻
1
2
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(3)
− 𝑦𝑜𝑏𝑠
(3)
2
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Stochastic Gradient Descent
𝜔0
𝜔1
𝜔2 𝜔3 𝜔4
• Use a single data point to determine the
gradient and cost function instead of all
the data
• Path is less direct due to noise in single
data point—"stochastic"
𝜔1 = 𝜔0 − 𝛼𝛻
1
2
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(0)
− 𝑦𝑜𝑏𝑠
(0)
2
…
• Perform an update for every 𝑛 training
examples
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Mini Batch Gradient Descent
𝜔0
𝜔1
𝜔1 = 𝜔0 − 𝛼𝛻
1
2
𝑖=1
𝑛
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
• Perform an update for every 𝑛 training
examples
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Mini Batch Gradient Descent
𝜔0
𝜔1
𝜔1 = 𝜔0 − 𝛼𝛻
1
2
𝑖=1
𝑛
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
• Perform an update for every 𝑛 training
examples
Best of both worlds:
• Reduced memory relative to "vanilla"
gradient descent
• Less noisy than stochastic gradient
descent
𝐽 𝛽0, 𝛽1
𝛽1 𝛽0
Mini Batch Gradient Descent
𝜔0
𝜔1
𝜔1 = 𝜔0 − 𝛼𝛻
1
2
𝑖=1
𝑛
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
• Mini batch implementation typically used for neural nets
• Batch sizes range from 50-256 points
• Trade off between batch size and learning rate (α)
• Tailor learning rate schedule: gradually reduce learning rate
during a given epoch
Mini Batch Gradient Descent
• Mini batch implementation typically used for neural nets
• Batch sizes range from 50-256 points
• Trade off between batch size and learning rate (α)
• Tailor learning rate schedule: gradually reduce learning rate
during a given epoch
Mini Batch Gradient Descent
• Mini batch implementation typically used for neural nets
• Batch sizes range from 50-256 points
• Trade off between batch size and learning rate (𝛼)
• Tailor learning rate schedule: gradually reduce learning rate
during a given epoch
Mini Batch Gradient Descent
• Mini batch implementation typically used for neural nets
• Batch sizes range from 50–256 points
• Trade off between batch size and learning rate (𝛼)
• Tailor learning rate schedule: gradually reduce learning rate
during a given epoch
Mini Batch Gradient Descent
Stochastic Gradient Descent Regression: Syntax
Import the class containing the regression model
from sklearn.linear_model import SGDRegressor
Import the class containing the regression model
from sklearn.linear_model import SGDRegressor
Create an instance of the class
SGDreg = SGDRregressor(loss='squared_loss',
alpha=0.1, penalty='l2')
Stochastic Gradient Descent Regression: Syntax
Import the class containing the regression model
from sklearn.linear_model import SGDRegressor
Create an instance of the class
SGDreg = SGDRregressor(loss='squared_loss',
alpha=0.1, penalty='l2')
squared_loss =
linear
regression
Stochastic Gradient Descent Regression: Syntax
Import the class containing the regression model
from sklearn.linear_model import SGDRegressor
Create an instance of the class
SGDreg = SGDRregressor(loss='squared_loss',
alpha=0.1, penalty='l2') regularization
parameters
Stochastic Gradient Descent Regression: Syntax
Import the class containing the regression model
from sklearn.linear_model import SGDRegressor
Create an instance of the class
SGDreg = SGDRregressor(loss='squared_loss',
alpha=0.1, penalty='l2')
Fit the instance on the data and then transform the data
SGDreg = SGDreg.fit(X_train, y_train)
y_pred = SGDreg.predict(X_test)
Stochastic Gradient Descent Regression: Syntax
Import the class containing the regression model
from sklearn.linear_model import SGDRegressor
Create an instance of the class
SGDreg = SGDRregressor(loss='squared_loss',
alpha=0.1, penalty='l2')
Fit the instance on the data and then transform the data
SGDreg = SGDreg.partial_fit(X_train, y_train)
y_pred = SGDreg.predict(X_test)
Stochastic Gradient Descent Regression: Syntax
mini-batch
version
Import the class containing the regression model
from sklearn.linear_model import SGDRegressor
Create an instance of the class
SGDreg = SGDRregressor(loss='squared_loss',
alpha=0.1, penalty='l2')
Fit the instance on the data and then transform the data
SGDreg = SGDreg.fit(X_train, y_train)
y_pred = SGDreg.predict(X_test)
Other loss methods exist: epsilon_insensitive, huber, etc.
Stochastic Gradient Descent Regression: The Syntax
Import the class containing the classification model
from sklearn.linear_model import SGDClassifier
Stochastic Gradient Descent Classification: The
Syntax
Import the class containing the classification model
from sklearn.linear_model import SGDClassifier
Create an instance of the class
SGDclass = SGDClassifier(loss='log',
alpha=0.1, penalty='l2')
Stochastic Gradient Descent Classification: The
Syntax
Import the class containing the classification model
from sklearn.linear_model import SGDClassifier
Create an instance of the class
SGDclass = SGDClassifier(loss='log',
alpha=0.1, penalty='l2')
Fit the instance on the data and then transform the data
SGDclass = SGDclass.fit(X_train, y_train)
y_pred = SGDclass.predict(X_test)
Other loss methods exist: hinge, squared_hinge, etc.
log loss =
logistic regression
Stochastic Gradient Descent Classification: The
Syntax
Import the class containing the classification model
from sklearn.linear_model import SGDClassifier
Create an instance of the class
SGDclass = SGDClassifier(loss='log',
alpha=0.1, penalty='l2')
Fit the instance on the data and then transform the data
SGDclass = SGDclass.fit(X_train, y_train)
y_pred = SGDclass.predict(X_test)
Stochastic Gradient Descent Classification: The
Syntax
Import the class containing the classification model
from sklearn.linear_model import SGDClassifier
Create an instance of the class
SGDclass = SGDClassifier(loss='log',
alpha=0.1, penalty='l2')
Fit the instance on the data and then transform the data
SGDclass = SGDclass.partial_fit(X_train, y_train)
y_pred = SGDclass.predict(X_test)
Stochastic Gradient Descent Classification: The
Syntax
mini-batch
version
Import the class containing the classification model
from sklearn.linear_model import SGDClassifier
Create an instance of the class
SGDclass = SGDClassifier(loss='log',
alpha=0.1, penalty='l2')
Fit the instance on the data and then transform the data
SGDclass = SGDclass.fit(X_train, y_train)
y_pred = SGDclass.predict(X_test)
Other loss methods exist: hinge, squared_hinge, etc.
Stochastic Gradient Descent Classification: The
Syntax
Import the class containing the classification model
from sklearn.linear_model import SGDClassifier
Create an instance of the class
SGDclass = SGDClassifier(loss='log',
alpha=0.1, penalty='l2')
Fit the instance on the data and then transform the data
SGDclass = SGDclass.fit(X_train, y_train)
y_pred = SGDclass.predict(X_test)
Other loss methods exist: hinge, squared_hinge, etc.
See SVM lecture
(week 7)
Stochastic Gradient Descent Classification: The
Syntax
Ot regularization and_gradient_descent

Mais conteúdo relacionado

Mais procurados

Ml5 svm and-kernels
Ml5 svm and-kernelsMl5 svm and-kernels
Ml5 svm and-kernelsankit_ppt
 
05 contours seg_matching
05 contours seg_matching05 contours seg_matching
05 contours seg_matchingankit_ppt
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenZhuyi Xue
 
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsananth
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basicsananth
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sVidyasagar Bhargava
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design PatternsAshwin Shiv
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplifiedLovelyn Rose
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
ICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged informationICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged informationAkisato Kimura
 
Aaa ped-22-Artificial Neural Network: Introduction to ANN
Aaa ped-22-Artificial Neural Network: Introduction to ANNAaa ped-22-Artificial Neural Network: Introduction to ANN
Aaa ped-22-Artificial Neural Network: Introduction to ANNAminaRepo
 

Mais procurados (19)

Ml5 svm and-kernels
Ml5 svm and-kernelsMl5 svm and-kernels
Ml5 svm and-kernels
 
05 contours seg_matching
05 contours seg_matching05 contours seg_matching
05 contours seg_matching
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi Chen
 
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
 
Algorithms Design Patterns
Algorithms Design PatternsAlgorithms Design Patterns
Algorithms Design Patterns
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
ICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged informationICCV2013 reading: Learning to rank using privileged information
ICCV2013 reading: Learning to rank using privileged information
 
Aaa ped-22-Artificial Neural Network: Introduction to ANN
Aaa ped-22-Artificial Neural Network: Introduction to ANNAaa ped-22-Artificial Neural Network: Introduction to ANN
Aaa ped-22-Artificial Neural Network: Introduction to ANN
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 

Semelhante a Ot regularization and_gradient_descent

Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionShruti Mohan
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsankit_ppt
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsankit_ppt
 
wk5ppt1_Titanic
wk5ppt1_Titanicwk5ppt1_Titanic
wk5ppt1_TitanicAliciaWei1
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepSanjanaSaxena17
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep LearningSourya Dey
 
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)Anmol Dwivedi
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Yao Yao
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
Data analysis on bank data
Data analysis on bank dataData analysis on bank data
Data analysis on bank dataANISH BHANUSHALI
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in RBabu Priyavrat
 
Presentation
PresentationPresentation
Presentationbutest
 

Semelhante a Ot regularization and_gradient_descent (20)

Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
wk5ppt2_Iris
wk5ppt2_Iriswk5ppt2_Iris
wk5ppt2_Iris
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metrics
 
wk5ppt1_Titanic
wk5ppt1_Titanicwk5ppt1_Titanic
wk5ppt1_Titanic
 
Ch 6 randomization
Ch 6 randomizationCh 6 randomization
Ch 6 randomization
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
working with python
working with pythonworking with python
working with python
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
 
TDD Training
TDD TrainingTDD Training
TDD Training
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
 
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear-Chain Conditional Random Fields (CRFs)
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
BPstudy sklearn 20180925
BPstudy sklearn 20180925BPstudy sklearn 20180925
BPstudy sklearn 20180925
 
Data analysis on bank data
Data analysis on bank dataData analysis on bank data
Data analysis on bank data
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
 
Presentation
PresentationPresentation
Presentation
 

Mais de ankit_ppt

Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summaryankit_ppt
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
04 image transformations_ii
04 image transformations_ii04 image transformations_ii
04 image transformations_iiankit_ppt
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_iankit_ppt
 
01 foundations
01 foundations01 foundations
01 foundationsankit_ppt
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
Text generation and_advanced_topics
Text generation and_advanced_topicsText generation and_advanced_topics
Text generation and_advanced_topicsankit_ppt
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesankit_ppt
 
Latent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modelingLatent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modelingankit_ppt
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlpankit_ppt
 
Ml8 boosting and-stacking
Ml8 boosting and-stackingMl8 boosting and-stacking
Ml8 boosting and-stackingankit_ppt
 
Ml6 decision trees
Ml6 decision treesMl6 decision trees
Ml6 decision treesankit_ppt
 
Ml4 naive bayes
Ml4 naive bayesMl4 naive bayes
Ml4 naive bayesankit_ppt
 
Lesson 3 ai in the enterprise
Lesson 3   ai in the enterpriseLesson 3   ai in the enterprise
Lesson 3 ai in the enterpriseankit_ppt
 
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsankit_ppt
 
Lesson 2 ai in industry
Lesson 2   ai in industryLesson 2   ai in industry
Lesson 2 ai in industryankit_ppt
 
Lesson 5 arima
Lesson 5 arimaLesson 5 arima
Lesson 5 arimaankit_ppt
 

Mais de ankit_ppt (18)

Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
04 image transformations_ii
04 image transformations_ii04 image transformations_ii
04 image transformations_ii
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_i
 
01 foundations
01 foundations01 foundations
01 foundations
 
Word2 vec
Word2 vecWord2 vec
Word2 vec
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Text generation and_advanced_topics
Text generation and_advanced_topicsText generation and_advanced_topics
Text generation and_advanced_topics
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
Latent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modelingLatent dirichlet allocation_and_topic_modeling
Latent dirichlet allocation_and_topic_modeling
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Ml8 boosting and-stacking
Ml8 boosting and-stackingMl8 boosting and-stacking
Ml8 boosting and-stacking
 
Ml6 decision trees
Ml6 decision treesMl6 decision trees
Ml6 decision trees
 
Ml4 naive bayes
Ml4 naive bayesMl4 naive bayes
Ml4 naive bayes
 
Lesson 3 ai in the enterprise
Lesson 3   ai in the enterpriseLesson 3   ai in the enterprise
Lesson 3 ai in the enterprise
 
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighborsMl1 introduction to-supervised_learning_and_k_nearest_neighbors
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
 
Lesson 2 ai in industry
Lesson 2   ai in industryLesson 2   ai in industry
Lesson 2 ai in industry
 
Lesson 5 arima
Lesson 5 arimaLesson 5 arima
Lesson 5 arima
 

Último

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 

Último (20)

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 

Ot regularization and_gradient_descent

  • 2. Legal Notices and Disclaimers This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com. This sample source code is released under the Intel Sample Source Code License Agreement. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright © 2017, Intel Corporation. All rights reserved.
  • 3. Model Complexity vs Error error 𝐽𝑐𝑣 𝜃 cross validation error 𝐽𝑡𝑟𝑎𝑖𝑛 𝜃 training error
  • 4. Preventing Under- and Overfitting • How to use a degree 9 polynomial and prevent overfitting? X Y Model True Function Samples X Y X Y Polynomial Degree = 1 Polynomial Degree = 3 Polynomial Degree = 9
  • 5. Preventing Under- and Overfitting 𝐽 𝛽0, 𝛽1 = 1 2𝑚 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 X Y Model True Function Samples X Y X Y Polynomial Degree = 1 Polynomial Degree = 3 Polynomial Degree = 9
  • 6. Regularization 𝐽 𝛽0, 𝛽1 = 1 2𝑚 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 + 𝜆 𝑗=1 𝑘 𝛽𝑗 2 X Y Model True Function Samples X Y X Y Poly Degree=9, 𝜆=0.1Poly Degree=9, 𝜆=1e-5Poly Degree=9, 𝜆=0.0
  • 7. 𝐽 𝛽0, 𝛽1 = 1 2𝑚 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 + 𝜆 𝑗=1 𝑘 𝛽𝑗 2 Ridge Regression (L2) • Penalty shrinks magnitude of all coefficients • Larger coefficients strongly penalized because of the squaring
  • 8. Effect of Ridge Regression on Parameters 𝐽 𝛽0, 𝛽1 = 1 2𝑚 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 + 𝜆 𝑗=1 𝑘 𝛽𝑗 2 X Model True Function Samples XX Y Poly=9, 𝜆=0.1Poly=9, 𝜆=1e-5Poly=9, 𝜆=0.0 1 abs(coefficient) 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 100 102 104 106 108
  • 9. 𝐽 𝛽0, 𝛽1 = 1 2𝑚 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 + 𝜆 𝑗=1 𝑘 𝛽𝑗 Lasso Regression (L1) • Penalty selectively shrinks some coefficients • Can be used for feature selection • Slower to converge than Ridge regression
  • 10. Effect of Lasso Regression on Parameters 𝐽 𝛽0, 𝛽1 = 1 2𝑚 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 + 𝜆 𝑗=1 𝑘 𝛽𝑗 X Model True Function Samples XX Y Poly=9, 𝜆=0.1Poly=9, 𝜆=1e-5Poly=9, 𝜆=0.0 1 abs(coefficient) 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 100 102 104 106 108
  • 11. 𝐽 𝛽0, 𝛽1 = 1 2𝑚 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 + 𝜆1 𝑗=1 𝑘 𝛽𝑗 + 𝜆2 𝑗=1 𝑘 𝛽𝑗 2 Elastic Net Regularization • Compromise of both Ridge and Lasso regression • Requires tuning of additional parameter that distributes regularization penalty between L1 and L2
  • 12. 𝐽 𝛽0, 𝛽1 = 1 2𝑚 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 + 𝜆1 𝑗=1 𝑘 𝛽𝑗 + 𝜆2 𝑗=1 𝑘 𝛽𝑗 2 X Y Model True Function Samples X Y X Y Poly=9, 𝜆1=𝜆2=0.1Poly=9, 𝜆1=𝜆2=1e-5Poly=9, 𝜆1=𝜆2=0.0 Elastic Net Regularization
  • 13. Hyperparameters and Their Optimization • Regularization coefficients (𝜆1 and 𝜆2) are empirically determined • Want value that generalizes—do not use test data for tuning • Create additional split of data to tune hyperparameters—validation set • Cross validation can also be used on training data Training Data Test Data Use Test Data to Tune 𝜆?
  • 14. Hyperparameters and Their Optimization • Regularization coefficients (𝜆1 and 𝜆2) are empirically determined • Want value that generalizes—do not use test data for tuning • Create additional split of data to tune hyperparameters—validation set • Cross validation can also be used on training data Training Data Test Data NO! Use Test Data to Tune 𝜆?
  • 15. Hyperparameters and Their Optimization • Regularization coefficients (𝜆1 and 𝜆2) are empirically determined • Want value that generalizes—do not use test data for tuning • Create additional split of data to tune hyperparameters—validation set • Cross validation can also be used on training data Training Data Test Data Tune 𝜆 with Cross Validation Validation Data
  • 16. Import the class containing the regression method from sklearn.linear_model import Ridge Create an instance of the class RR = Ridge(alpha=1.0) Fit the instance on the data and then predict the expected value RR = RR.fit(X_train, y_train) y_predict = RR.predict(X_test) The RidgeCV class will perform cross validation on a set of values for alpha. Ridge Regression: The Syntax
  • 17. Import the class containing the regression method from sklearn.linear_model import Ridge Create an instance of the class RR = Ridge(alpha=1.0) Fit the instance on the data and then predict the expected value RR = RR.fit(X_train, y_train) y_predict = RR.predict(X_test) The RidgeCV class will perform cross validation on a set of values for alpha. Ridge Regression: The Syntax
  • 18. Import the class containing the regression method from sklearn.linear_model import Ridge Create an instance of the class RR = Ridge(alpha=1.0) Fit the instance on the data and then predict the expected value RR = RR.fit(X_train, y_train) y_predict = RR.predict(X_test) The RidgeCV class will perform cross validation on a set of values for alpha. Ridge Regression: The Syntax regularization parameter
  • 19. Import the class containing the regression method from sklearn.linear_model import Ridge Create an instance of the class RR = Ridge(alpha=1.0) Fit the instance on the data and then predict the expected value RR = RR.fit(X_train, y_train) y_predict = RR.predict(X_test) The RidgeCV class will perform cross validation on a set of values for alpha. Ridge Regression: The Syntax
  • 20. Ridge Regression: The Syntax Import the class containing the regression method from sklearn.linear_model import Ridge Create an instance of the class RR = Ridge(alpha=1.0) Fit the instance on the data and then predict the expected value RR = RR.fit(X_train, y_train) y_predict = RR.predict(X_test) The RidgeCV class will perform cross validation on a set of values for alpha.
  • 21. Lasso Regression: The Syntax Import the class containing the regression method from sklearn.linear_model import Lasso Create an instance of the class LR = Lasso(alpha=1.0) Fit the instance on the data and then predict the expected value LR = LR.fit(X_train, y_train) y_predict = LR.predict(X_test) The LassoCV class will perform cross validation on a set of values for alpha.
  • 22. Lasso Regression: The Syntax Import the class containing the regression method from sklearn.linear_model import Lasso Create an instance of the class LR = Lasso(alpha=1.0) Fit the instance on the data and then predict the expected value LR = LR.fit(X_train, y_train) y_predict = LR.predict(X_test) The LassoCV class will perform cross validation on a set of values for alpha. regularization parameter
  • 23. Elastic Net Regression: The Syntax Import the class containing the regression method from sklearn.linear_model import ElasticNet Create an instance of the class EN = ElasticNet(alpha=1.0, l1_ratio=0.5) Fit the instance on the data and then predict the expected value EN = EN.fit(X_train, y_train) y_predict = EN.predict(X_test) The ElasticNetCV class will perform cross validation on a set of values for l1_ratio and alpha.
  • 24. Elastic Net Regression: The Syntax alpha is the regularization parameter Import the class containing the regression method from sklearn.linear_model import ElasticNet Create an instance of the class EN = ElasticNet(alpha=1.0, l1_ratio=0.5) Fit the instance on the data and then predict the expected value EN = EN.fit(X_train, y_train) y_predict = EN.predict(X_test) The ElasticNetCV class will perform cross validation on a set of values for l1_ratio and alpha.
  • 25. Elastic Net Regression: The Syntax l1_ratio distributes alpha to L1/L2 Import the class containing the regression method from sklearn.linear_model import ElasticNet Create an instance of the class EN = ElasticNet(alpha=1.0, l1_ratio=0.5) Fit the instance on the data and then predict the expected value EN = EN.fit(X_train, y_train) y_predict = EN.predict(X_test) The ElasticNetCV class will perform cross validation on a set of values for l1_ratio and alpha.
  • 26. Feature Selection • Regularization performs feature selection by shrinking the contribution of features • For L1-regularization, this is accomplished by driving some coefficients to zero • Feature selection can also be performed by removing features
  • 27. Feature Selection • Regularization performs feature selection by shrinking the contribution of features • For L1-regularization, this is accomplished by driving some coefficients to zero • Feature selection can also be performed by removing features
  • 28. Feature Selection • Regularization performs feature selection by shrinking the contribution of features • For L1-regularization, this is accomplished by driving some coefficients to zero • Feature selection can also be performed by removing features
  • 29. Why is Feature Selection Important? • Reducing the number of features is another way to prevent overfitting (similar to regularization) • For some models, fewer features can improve fitting time and/or results • Identifying most critical features can improve model interpretability
  • 30. Why is Feature Selection Important? • Reducing the number of features is another way to prevent overfitting (similar to regularization) • For some models, fewer features can improve fitting time and/or results • Identifying most critical features can improve model interpretability
  • 31. Why is Feature Selection Important? • Reducing the number of features is another way to prevent overfitting (similar to regularization) • For some models, fewer features can improve fitting time and/or results • Identifying most critical features can improve model interpretability
  • 32. Recursive Feature Elimination: The Syntax Import the class containing the feature selection method from sklearn.feature_selection import RFE Create an instance of the class rfeMod = RFE(est, n_features_to_select=5) Fit the instance on the data and then predict the expected value rfeMod = rfeMod.fit(X_train, y_train) y_predict = rfeMod.predict(X_test) The RFECV class will perform feature elimination using cross validation.
  • 33. Recursive Feature Elimination: The Syntax est is an instance of the model to use Import the class containing the feature selection method from sklearn.feature_selection import RFE Create an instance of the class rfeMod = RFE(est, n_features_to_select=5) Fit the instance on the data and then predict the expected value rfeMod = rfeMod.fit(X_train, y_train) y_predict = rfeMod.predict(X_test) The RFECV class will perform feature elimination using cross validation.
  • 34. Recursive Feature Elimination: The Syntax Import the class containing the feature selection method from sklearn.feature_selection import RFE Create an instance of the class rfeMod = RFE(est, n_features_to_select=5) Fit the instance on the data and then predict the expected value rfeMod = rfeMod.fit(X_train, y_train) y_predict = rfeMod.predict(X_test) The RFECV class will perform feature elimination using cross validation. final number of features
  • 35.
  • 37. 𝐽 𝛽 Gradient Descent 𝛽 Start with a cost function J(𝛽):
  • 38. Then gradually move towards the minimum. 𝐽 𝛽 Gradient Descent 𝛽 Start with a cost function J(𝛽): Global Minimum
  • 39. Gradient Descent with Linear Regression • Now imagine there are two parameters (𝛽0, 𝛽1) • This is a more complicated surface on which the minimum must be found • How can we do this without knowing what 𝐽 𝛽0, 𝛽1 looks like? 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0
  • 40. Gradient Descent with Linear Regression • Now imagine there are two parameters (𝛽0, 𝛽1) • This is a more complicated surface on which the minimum must be found • How can we do this without knowing what 𝐽 𝛽0, 𝛽1 looks like? 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0
  • 41. Gradient Descent with Linear Regression • Now imagine there are two parameters (𝛽0, 𝛽1) • This is a more complicated surface on which the minimum must be found • How can we do this without knowing what 𝐽 𝛽0, 𝛽1 looks like? 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0
  • 42. • Compute the gradient, 𝛻𝐽 𝛽0, 𝛽1 , which points in the direction of the biggest increase! • -𝛻𝐽 𝛽0, 𝛽1 (negative gradient) points to the biggest decrease at that point! 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Gradient Descent with Linear Regression
  • 43. • The gradient is the a vector whose coordinates consist of the partial derivatives of the parameters 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Gradient Descent with Linear Regression 𝛻𝐽 𝛽0, … , 𝛽 𝑛 = < 𝜕𝐽 𝜕𝛽0 , … , 𝜕𝐽 𝜕𝛽 𝑛 >
  • 44. • Then use the gradient (𝛻) and the cost function to calculate the next point (𝜔1) from the current one (𝜔0): • The learning rate (𝛼) is a tunable parameter that determines step size 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Gradient Descent with Linear Regression 𝜔1 = 𝜔0 − 𝛼𝛻 1 2 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 𝜔0 𝜔1
  • 45. • Then use the gradient (𝛻) and the cost function to calculate the next point (𝜔1) from the current one (𝜔0): • The learning rate (𝛼) is a tunable parameter that determines step size 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Gradient Descent with Linear Regression 𝜔1 = 𝜔0 − 𝛼𝛻 1 2 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 𝜔0 𝜔1
  • 46. • Each point can be iteratively calculated from the previous one 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Gradient Descent with Linear Regression 𝜔2 = 𝜔1 − 𝛼𝛻 1 2 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 𝜔0 𝜔1 𝜔2 𝜔3 = 𝜔2 − 𝛼𝛻 1 2 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2
  • 47. • Each point can be iteratively calculated from the previous one 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Gradient Descent with Linear Regression 𝜔2 = 𝜔1 − 𝛼𝛻 1 2 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 𝜔0 𝜔1 𝜔2 𝜔3 = 𝜔2 − 𝛼𝛻 1 2 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 𝜔3
  • 48. • Use a single data point to determine the gradient and cost function instead of all the data 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Stochastic Gradient Descent 𝜔1 = 𝜔0 − 𝛼𝛻 1 2 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (0) − 𝑦𝑜𝑏𝑠 (0) 2 𝜔0 𝜔1 𝜔1 = 𝜔0 − 𝛼𝛻 1 2 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2
  • 49. • Use a single data point to determine the gradient and cost function instead of all the data 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Stochastic Gradient Descent 𝜔0 𝜔1 𝜔1 = 𝜔0 − 𝛼𝛻 1 2 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2
  • 50. • Use a single data point to determine the gradient and cost function instead of all the data 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Stochastic Gradient Descent 𝜔1 = 𝜔0 − 𝛼𝛻 1 2 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (0) − 𝑦𝑜𝑏𝑠 (0) 2 𝜔0 𝜔1 𝜔1 = 𝜔0 − 𝛼𝛻 1 2 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2
  • 51. 𝜔4 = 𝜔3 − 𝛼𝛻 1 2 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (3) − 𝑦𝑜𝑏𝑠 (3) 2 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Stochastic Gradient Descent 𝜔0 𝜔1 𝜔2 𝜔3 𝜔4 • Use a single data point to determine the gradient and cost function instead of all the data 𝜔1 = 𝜔0 − 𝛼𝛻 1 2 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (0) − 𝑦𝑜𝑏𝑠 (0) 2 …
  • 52. 𝜔4 = 𝜔3 − 𝛼𝛻 1 2 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (3) − 𝑦𝑜𝑏𝑠 (3) 2 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Stochastic Gradient Descent 𝜔0 𝜔1 𝜔2 𝜔3 𝜔4 • Use a single data point to determine the gradient and cost function instead of all the data • Path is less direct due to noise in single data point—"stochastic" 𝜔1 = 𝜔0 − 𝛼𝛻 1 2 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (0) − 𝑦𝑜𝑏𝑠 (0) 2 …
  • 53. • Perform an update for every 𝑛 training examples 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Mini Batch Gradient Descent 𝜔0 𝜔1 𝜔1 = 𝜔0 − 𝛼𝛻 1 2 𝑖=1 𝑛 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2
  • 54. • Perform an update for every 𝑛 training examples 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Mini Batch Gradient Descent 𝜔0 𝜔1 𝜔1 = 𝜔0 − 𝛼𝛻 1 2 𝑖=1 𝑛 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2
  • 55. • Perform an update for every 𝑛 training examples Best of both worlds: • Reduced memory relative to "vanilla" gradient descent • Less noisy than stochastic gradient descent 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 Mini Batch Gradient Descent 𝜔0 𝜔1 𝜔1 = 𝜔0 − 𝛼𝛻 1 2 𝑖=1 𝑛 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2
  • 56. • Mini batch implementation typically used for neural nets • Batch sizes range from 50-256 points • Trade off between batch size and learning rate (α) • Tailor learning rate schedule: gradually reduce learning rate during a given epoch Mini Batch Gradient Descent
  • 57. • Mini batch implementation typically used for neural nets • Batch sizes range from 50-256 points • Trade off between batch size and learning rate (α) • Tailor learning rate schedule: gradually reduce learning rate during a given epoch Mini Batch Gradient Descent
  • 58. • Mini batch implementation typically used for neural nets • Batch sizes range from 50-256 points • Trade off between batch size and learning rate (𝛼) • Tailor learning rate schedule: gradually reduce learning rate during a given epoch Mini Batch Gradient Descent
  • 59. • Mini batch implementation typically used for neural nets • Batch sizes range from 50–256 points • Trade off between batch size and learning rate (𝛼) • Tailor learning rate schedule: gradually reduce learning rate during a given epoch Mini Batch Gradient Descent
  • 60. Stochastic Gradient Descent Regression: Syntax Import the class containing the regression model from sklearn.linear_model import SGDRegressor
  • 61. Import the class containing the regression model from sklearn.linear_model import SGDRegressor Create an instance of the class SGDreg = SGDRregressor(loss='squared_loss', alpha=0.1, penalty='l2') Stochastic Gradient Descent Regression: Syntax
  • 62. Import the class containing the regression model from sklearn.linear_model import SGDRegressor Create an instance of the class SGDreg = SGDRregressor(loss='squared_loss', alpha=0.1, penalty='l2') squared_loss = linear regression Stochastic Gradient Descent Regression: Syntax
  • 63. Import the class containing the regression model from sklearn.linear_model import SGDRegressor Create an instance of the class SGDreg = SGDRregressor(loss='squared_loss', alpha=0.1, penalty='l2') regularization parameters Stochastic Gradient Descent Regression: Syntax
  • 64. Import the class containing the regression model from sklearn.linear_model import SGDRegressor Create an instance of the class SGDreg = SGDRregressor(loss='squared_loss', alpha=0.1, penalty='l2') Fit the instance on the data and then transform the data SGDreg = SGDreg.fit(X_train, y_train) y_pred = SGDreg.predict(X_test) Stochastic Gradient Descent Regression: Syntax
  • 65. Import the class containing the regression model from sklearn.linear_model import SGDRegressor Create an instance of the class SGDreg = SGDRregressor(loss='squared_loss', alpha=0.1, penalty='l2') Fit the instance on the data and then transform the data SGDreg = SGDreg.partial_fit(X_train, y_train) y_pred = SGDreg.predict(X_test) Stochastic Gradient Descent Regression: Syntax mini-batch version
  • 66. Import the class containing the regression model from sklearn.linear_model import SGDRegressor Create an instance of the class SGDreg = SGDRregressor(loss='squared_loss', alpha=0.1, penalty='l2') Fit the instance on the data and then transform the data SGDreg = SGDreg.fit(X_train, y_train) y_pred = SGDreg.predict(X_test) Other loss methods exist: epsilon_insensitive, huber, etc. Stochastic Gradient Descent Regression: The Syntax
  • 67. Import the class containing the classification model from sklearn.linear_model import SGDClassifier Stochastic Gradient Descent Classification: The Syntax
  • 68. Import the class containing the classification model from sklearn.linear_model import SGDClassifier Create an instance of the class SGDclass = SGDClassifier(loss='log', alpha=0.1, penalty='l2') Stochastic Gradient Descent Classification: The Syntax
  • 69. Import the class containing the classification model from sklearn.linear_model import SGDClassifier Create an instance of the class SGDclass = SGDClassifier(loss='log', alpha=0.1, penalty='l2') Fit the instance on the data and then transform the data SGDclass = SGDclass.fit(X_train, y_train) y_pred = SGDclass.predict(X_test) Other loss methods exist: hinge, squared_hinge, etc. log loss = logistic regression Stochastic Gradient Descent Classification: The Syntax
  • 70. Import the class containing the classification model from sklearn.linear_model import SGDClassifier Create an instance of the class SGDclass = SGDClassifier(loss='log', alpha=0.1, penalty='l2') Fit the instance on the data and then transform the data SGDclass = SGDclass.fit(X_train, y_train) y_pred = SGDclass.predict(X_test) Stochastic Gradient Descent Classification: The Syntax
  • 71. Import the class containing the classification model from sklearn.linear_model import SGDClassifier Create an instance of the class SGDclass = SGDClassifier(loss='log', alpha=0.1, penalty='l2') Fit the instance on the data and then transform the data SGDclass = SGDclass.partial_fit(X_train, y_train) y_pred = SGDclass.predict(X_test) Stochastic Gradient Descent Classification: The Syntax mini-batch version
  • 72. Import the class containing the classification model from sklearn.linear_model import SGDClassifier Create an instance of the class SGDclass = SGDClassifier(loss='log', alpha=0.1, penalty='l2') Fit the instance on the data and then transform the data SGDclass = SGDclass.fit(X_train, y_train) y_pred = SGDclass.predict(X_test) Other loss methods exist: hinge, squared_hinge, etc. Stochastic Gradient Descent Classification: The Syntax
  • 73. Import the class containing the classification model from sklearn.linear_model import SGDClassifier Create an instance of the class SGDclass = SGDClassifier(loss='log', alpha=0.1, penalty='l2') Fit the instance on the data and then transform the data SGDclass = SGDclass.fit(X_train, y_train) y_pred = SGDclass.predict(X_test) Other loss methods exist: hinge, squared_hinge, etc. See SVM lecture (week 7) Stochastic Gradient Descent Classification: The Syntax