SlideShare uma empresa Scribd logo
1 de 60
Machine Learning
Instructor: Dr. Emad Nabil
Linear regression with one variable
Model representation
Credits: This material is based mainly on Andrew NG’s Coursera ML course, edited by Dr. Emad Nabil
Model Representation And Hypothesis
Supervised Learning
Formally stated, the aim of the supervised learning algorithm is to use the training Dataset and output
a hypothesis function, h where h is a function that takes the input instance and predicts the output based
on its learnings from the training dataset.
f1 f1 f3 fn label
m examples
n features
As shown above, say given a dataset where each training instance consists of the area of a house and its
price, the job of the learning algorithm would be to produce a hypothesis, h such that it takes the size of house
as input and predicts its price
For example, for Linear Regression in One Variable or Univariate Linear Regression, the hypothesis, h
is given by:
Hypothesis
Hypothesis is the function that is to be learnt by the learning algorithm by the training progress for making the
predictions about the unseen data.
Cost Function of Linear Regression
Linear Regression
Size (feet)2
Price in 1000’
dollars
Cost Function of Linear Regression
Size (feet)2
the objective of the learning algorithm is to find the best
parameters to fit the dataset i.e. choose θ0 and θ1 so
that hθ(x) is close to y for the training examples (x, y).
This can be mathematically represented as,
Price in 1000’
dollars
Fit model by minimizing mean of squared errors
Least Squared errors Linear Regression
Predicted by h actual
Cost function visualization
Consider a simple case of hypothesis by setting θ0=0, then h becomes : hθ(x)=θ1x
which corresponds to different lines passing through the origin as shown in plots below as y-intercept
i.e. θ0 is nulled out.
Each value of θ1 corresponds to a different hypothesis as it is the slope of the line
Simple Hypothesis
At θ1=1,
At θ1=2,
At θ1=0.5, J(0.5)
Cost function visualization
Simple Hypothesis
At θ1=1,
At θ1=2,
At θ1=0.5, J(0.5)
On plotting points like this further, one gets
the following graph for the cost function which
is dependent on parameter θ1.
plot each value of θ1 corresponds to a
different hypothesizes
Cost function visualization
What is the optimal value of θ1 that minimizes J(θ1) ?
It is clear that best value for θ1 =1 as J(θ1 ) = 0,
which is the minimum.
How to find the best value for θ1 ?
Plotting ?? Not practical specially in high
dimensions?
The solution :
1. Analytical solution: not applicable for large
datasets
2. Numerical solution: ex: Gradient descent .
polting the cost function 𝑗 𝜃0, 𝜃1
Each point (like the red one above) represents a
pair of parameter values for Ɵ0 and Ɵ1
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
Gradient descent
To reach the lowest point
We should move the ball step by
step in the gradient direction
Andrew Ng
1
0
J(0,1)
Imagine that this is a landscape of grassy park, and you want to go to
the lowest point in the park as rapidly as possible
Red: means high
blue: means low
Starting point
local minimum
Andrew Ng
0
1
J(0,1)
Red: means high
blue: means low
With different starting point
New Starting
point
New local
minimum
Andrew Ng
Have some function
Want
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
Gradient Descent
Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function.
This is where gradient descent steps in. There are two basic steps involved in this algorithm:
 Start with some random value of θ0, θ1.
 Keep updating the value of θ0, θ1 to reduce J(θ0, θ1) until minimum is reached.
Gradient descent
Understanding Gradient Descent
Consider a simpler cost function J(θ1) and the objective: Minθ1 J(θ1)
Repeat{
𝜃1 = 𝜃1 − 𝛼
𝑑
𝑑𝜃1
𝐽 𝜃1
}
+ve slope
-ve slope
If the slope is positive and since α is a positive real number then
overall term −𝜶
𝒅
𝒅𝜽𝟏
𝑱(𝜽𝟏) would be negative. This means
the value of θ1 will be decreased in magnitude as shown by the
arrows.
Understanding Gradient Descent
Consider a simpler cost function J(θ1) and the objective: Minθ1 J(θ1) +ve slope
-ve slope
if the initialization was at point B, then the slope of the
line would be negative and then update term would be
positive leading to increase in the value of θ1 as shown in
the plot
Repeat{
𝜃1 = 𝜃1 − 𝛼
𝑑
𝑑𝜃1
𝐽 𝜃1
}
Understanding Gradient Descent
Consider a simpler cost function J(θ1) and the objective: Minθ1 J(θ1) +ve slope
-ve slope
Repeat{
𝜃1 = 𝜃1 − 𝛼
𝑑
𝑑𝜃1
𝐽 𝜃1
}
So, no matter where the θ1 is initialized, the algorithm ensures that
parameter is updated in the right direction towards the minima,
given proper value for α is chosen.
How can we choose the learning Rate: α?
here
Repeat{
𝜃1 = 𝜃1 − 𝛼
𝑑
𝑑𝜃1
𝐽 𝜃1
}
•The yellow plot shows the divergence of the algorithm
when the learning rate is really high wherein the learning
steps overshoot.
•The green plot shows the case where learning rate is
not as large as the previous case but is high enough that
the steps keep oscillating at a point which is not the
minima.
•The red plot would be the optimum curve for the
cost drop as it drops steeply initially and then saturates
very close to the optimum value.
•The blue plot is the least value of αα and converges
very slowly as the steps taken by the algorithm during
update steps are very small.
The learning rate 𝛼 vs the cost function 𝐽 𝜃1
How to choose the learning rate?
Automatic Convergence Test: Gradient descent can be considered to be converged if the drop in
cost function is not more than a preset threshold say 10−3
In order to choose optimum value of α run the algorithm with different values like:
1, 0.3, 0.1, 0.03, 0.01 etc and plot the learning curve to understand whether the value should be
increased or decreased.
Gradient Descent Stopping condition
Practical tips
Slope1> slope2> slope3> …..
By time the slope is getting smaller, consequently the value of the step is automatically is getting smaller by time, so no
need to change the value of alpha.
Fixed learning rate alpha
Is it required to decrease the value of 𝜶 by time?
Applying gradient decent to the linear model
here
area price
1
2
…
m
area age price
1
2
…
m
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1𝑥1 + 𝜃2𝑥2
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1𝑥1
area …. #Floors price
1
2
…
m
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1𝑥1 + ⋯ + 𝜃𝑛𝑥𝑛
Model against dataset
the housing price problem
Andrew Ng
Gradient descent algorithm Linear Regression Model
Fixed learning rate alpha
Slope1> slope2> slope3> …..
By time the slope is getting smaller, consequently the value of the step is automatically is getting smaller by time, so no
need to change the value of alpha.
In order to apply gradient descent, the derivative term
𝜕
𝜕𝜃𝑗
𝐽(𝜃0, 𝜃1) needs to be calculated.
𝜕
𝜕𝜃𝑗
𝐽(𝜃0, 𝜃1) =
𝜕
𝜕𝜃𝑗
(
1
2𝑚
𝑖=1
𝑚
ℎ𝜃 𝑥 𝑖
− 𝑦 𝑖 2
)
=
1
2𝑚
(
𝜕
𝜕𝜃𝑗
𝑖=1
𝑚
ℎ𝜃(𝑥 𝑖 ) − 𝑦 𝑖 2
)
=
1
2𝑚
(
𝜕
𝜕𝜃𝑗
𝑖=1
𝑚
𝜃0 + 𝜃1𝑥 𝑖 − 𝑦 𝑖 2
)
=
1
𝑚
𝑖=1
𝑚
(ℎ𝜃(𝑥 𝑖 ) − 𝑦 𝑖 ) for j = 0
1
𝑚
𝑖=1
𝑚
(ℎ𝜃(𝑥 𝑖
) − 𝑦 𝑖
)𝑥 𝑖
for j = 1
In order to apply gradient descent, the derivative term
𝜕
𝜕𝜃𝑗
𝐽(𝜃0, 𝜃1) needs to be calculated.
Risk of meeting different local optimum
The problem of existing more than one local optima, may make the gradient decent converge to the
non global optima.
local optima
local optima
Cost/error function is always convex
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x)
Now you can do prediction, given a house size!
The gradient descent technique that uses all the training examples in each step is called Batch Gradient
Descent. This is basically the calculation of the derivative term over all the training examples as it can be
seen it the equation above.
The equation of linear regression can also be solved using Normal Equations method, but it poses a
disadvantage that it does not scale very well on larger data while gradient descent does.
There are plenty of optimizers
Examples
1. Adagrad
2. Adadelta
3. Adam
4. Conjugate Gradients
5. BFGS
6. Momentum
7. Nesterov Momentum
8. Newton’s Method
9. RMSProp
10. SGD
2. Linear regression with one variable.pptx

Mais conteúdo relacionado

Mais procurados

String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)Aditya pratap Singh
 
Token, Pattern and Lexeme
Token, Pattern and LexemeToken, Pattern and Lexeme
Token, Pattern and LexemeA. S. M. Shafi
 
dynamic programming Rod cutting class
dynamic programming Rod cutting classdynamic programming Rod cutting class
dynamic programming Rod cutting classgiridaroori
 
9. chapter 8 np hard and np complete problems
9. chapter 8   np hard and np complete problems9. chapter 8   np hard and np complete problems
9. chapter 8 np hard and np complete problemsJyotsna Suryadevara
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammarmeresie tesfay
 
Dijkstra's algorithm presentation
Dijkstra's algorithm presentationDijkstra's algorithm presentation
Dijkstra's algorithm presentationSubid Biswas
 
Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2DigiGurukul
 
implementation of travelling salesman problem with complexity ppt
implementation of travelling salesman problem with complexity pptimplementation of travelling salesman problem with complexity ppt
implementation of travelling salesman problem with complexity pptAntaraBhattacharya12
 
1.1. the central concepts of automata theory
1.1. the central concepts of automata theory1.1. the central concepts of automata theory
1.1. the central concepts of automata theorySampath Kumar S
 
Np completeness
Np completenessNp completeness
Np completenessRajendran
 
Travelling SalesMan Problem(TSP)
Travelling SalesMan Problem(TSP)Travelling SalesMan Problem(TSP)
Travelling SalesMan Problem(TSP)Akshay Kamble
 
Fuzzy logic Notes AI CSE 8th Sem
Fuzzy logic Notes AI CSE 8th SemFuzzy logic Notes AI CSE 8th Sem
Fuzzy logic Notes AI CSE 8th SemDigiGurukul
 

Mais procurados (20)

String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)
 
Token, Pattern and Lexeme
Token, Pattern and LexemeToken, Pattern and Lexeme
Token, Pattern and Lexeme
 
Branch and bound
Branch and boundBranch and bound
Branch and bound
 
dynamic programming Rod cutting class
dynamic programming Rod cutting classdynamic programming Rod cutting class
dynamic programming Rod cutting class
 
9. chapter 8 np hard and np complete problems
9. chapter 8   np hard and np complete problems9. chapter 8   np hard and np complete problems
9. chapter 8 np hard and np complete problems
 
Ngrams smoothing
Ngrams smoothingNgrams smoothing
Ngrams smoothing
 
Flat unit 1
Flat unit 1Flat unit 1
Flat unit 1
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammar
 
Dijkstra's algorithm presentation
Dijkstra's algorithm presentationDijkstra's algorithm presentation
Dijkstra's algorithm presentation
 
Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2
 
Branch and bound
Branch and boundBranch and bound
Branch and bound
 
implementation of travelling salesman problem with complexity ppt
implementation of travelling salesman problem with complexity pptimplementation of travelling salesman problem with complexity ppt
implementation of travelling salesman problem with complexity ppt
 
1.1. the central concepts of automata theory
1.1. the central concepts of automata theory1.1. the central concepts of automata theory
1.1. the central concepts of automata theory
 
Np completeness
Np completenessNp completeness
Np completeness
 
Input-Buffering
Input-BufferingInput-Buffering
Input-Buffering
 
Kmp
KmpKmp
Kmp
 
Rabin karp string matcher
Rabin karp string matcherRabin karp string matcher
Rabin karp string matcher
 
CPU Scheduling Algorithms
CPU Scheduling AlgorithmsCPU Scheduling Algorithms
CPU Scheduling Algorithms
 
Travelling SalesMan Problem(TSP)
Travelling SalesMan Problem(TSP)Travelling SalesMan Problem(TSP)
Travelling SalesMan Problem(TSP)
 
Fuzzy logic Notes AI CSE 8th Sem
Fuzzy logic Notes AI CSE 8th SemFuzzy logic Notes AI CSE 8th Sem
Fuzzy logic Notes AI CSE 8th Sem
 

Semelhante a 2. Linear regression with one variable.pptx

Machine learning
Machine learningMachine learning
Machine learningShreyas G S
 
4 linear regeression with multiple variables
4 linear regeression with multiple variables4 linear regeression with multiple variables
4 linear regeression with multiple variablesTanmayVijay1
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesEric Conner
 
2 linear regression with one variable
2 linear regression with one variable2 linear regression with one variable
2 linear regression with one variableTanmayVijay1
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdfAmir Saleh
 
Linear Regression.pptx
Linear Regression.pptxLinear Regression.pptx
Linear Regression.pptxnathansel1
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)NYversity
 
Coursera 1week
Coursera  1weekCoursera  1week
Coursera 1weekcsl9496
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieMarco Moldenhauer
 
1. Regression_V1.pdf
1. Regression_V1.pdf1. Regression_V1.pdf
1. Regression_V1.pdfssuser4c50a9
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descentRevanth Kumar
 
Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)cairo university
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep LearningShajun Nisha
 
Intensity Transformation and Spatial filtering
Intensity Transformation and Spatial filteringIntensity Transformation and Spatial filtering
Intensity Transformation and Spatial filteringShajun Nisha
 
Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Deepak John
 

Semelhante a 2. Linear regression with one variable.pptx (20)

Machine learning
Machine learningMachine learning
Machine learning
 
4 linear regeression with multiple variables
4 linear regeression with multiple variables4 linear regeression with multiple variables
4 linear regeression with multiple variables
 
Ann a Algorithms notes
Ann a Algorithms notesAnn a Algorithms notes
Ann a Algorithms notes
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture Notes
 
2 linear regression with one variable
2 linear regression with one variable2 linear regression with one variable
2 linear regression with one variable
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
 
Linear Regression.pptx
Linear Regression.pptxLinear Regression.pptx
Linear Regression.pptx
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)
 
Polynomials lecture
Polynomials lecturePolynomials lecture
Polynomials lecture
 
Coursera 1week
Coursera  1weekCoursera  1week
Coursera 1week
 
X01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorieX01 Supervised learning problem linear regression one feature theorie
X01 Supervised learning problem linear regression one feature theorie
 
working with python
working with pythonworking with python
working with python
 
1. Regression_V1.pdf
1. Regression_V1.pdf1. Regression_V1.pdf
1. Regression_V1.pdf
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
 
Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
 
Regression
RegressionRegression
Regression
 
Intensity Transformation and Spatial filtering
Intensity Transformation and Spatial filteringIntensity Transformation and Spatial filtering
Intensity Transformation and Spatial filtering
 
Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1
 
Solution 3.
Solution 3.Solution 3.
Solution 3.
 

Último

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 

Último (20)

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

2. Linear regression with one variable.pptx

  • 1. Machine Learning Instructor: Dr. Emad Nabil Linear regression with one variable Model representation Credits: This material is based mainly on Andrew NG’s Coursera ML course, edited by Dr. Emad Nabil
  • 2.
  • 3.
  • 5. Supervised Learning Formally stated, the aim of the supervised learning algorithm is to use the training Dataset and output a hypothesis function, h where h is a function that takes the input instance and predicts the output based on its learnings from the training dataset. f1 f1 f3 fn label m examples n features
  • 6. As shown above, say given a dataset where each training instance consists of the area of a house and its price, the job of the learning algorithm would be to produce a hypothesis, h such that it takes the size of house as input and predicts its price
  • 7. For example, for Linear Regression in One Variable or Univariate Linear Regression, the hypothesis, h is given by: Hypothesis Hypothesis is the function that is to be learnt by the learning algorithm by the training progress for making the predictions about the unseen data.
  • 8. Cost Function of Linear Regression Linear Regression Size (feet)2 Price in 1000’ dollars
  • 9. Cost Function of Linear Regression Size (feet)2 the objective of the learning algorithm is to find the best parameters to fit the dataset i.e. choose θ0 and θ1 so that hθ(x) is close to y for the training examples (x, y). This can be mathematically represented as, Price in 1000’ dollars Fit model by minimizing mean of squared errors
  • 10. Least Squared errors Linear Regression
  • 11. Predicted by h actual
  • 12. Cost function visualization Consider a simple case of hypothesis by setting θ0=0, then h becomes : hθ(x)=θ1x which corresponds to different lines passing through the origin as shown in plots below as y-intercept i.e. θ0 is nulled out. Each value of θ1 corresponds to a different hypothesis as it is the slope of the line Simple Hypothesis At θ1=1, At θ1=2, At θ1=0.5, J(0.5)
  • 13. Cost function visualization Simple Hypothesis At θ1=1, At θ1=2, At θ1=0.5, J(0.5) On plotting points like this further, one gets the following graph for the cost function which is dependent on parameter θ1. plot each value of θ1 corresponds to a different hypothesizes
  • 14. Cost function visualization What is the optimal value of θ1 that minimizes J(θ1) ? It is clear that best value for θ1 =1 as J(θ1 ) = 0, which is the minimum. How to find the best value for θ1 ? Plotting ?? Not practical specially in high dimensions? The solution : 1. Analytical solution: not applicable for large datasets 2. Numerical solution: ex: Gradient descent .
  • 15. polting the cost function 𝑗 𝜃0, 𝜃1
  • 16. Each point (like the red one above) represents a pair of parameter values for Ɵ0 and Ɵ1
  • 17.
  • 18.
  • 19. (for fixed , this is a function of x) (function of the parameters )
  • 20. (for fixed , this is a function of x) (function of the parameters )
  • 22. To reach the lowest point We should move the ball step by step in the gradient direction
  • 23. Andrew Ng 1 0 J(0,1) Imagine that this is a landscape of grassy park, and you want to go to the lowest point in the park as rapidly as possible Red: means high blue: means low Starting point local minimum
  • 24. Andrew Ng 0 1 J(0,1) Red: means high blue: means low With different starting point New Starting point New local minimum
  • 25. Andrew Ng Have some function Want Outline: • Start with some • Keep changing to reduce until we hopefully end up at a minimum
  • 26. Gradient Descent Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. This is where gradient descent steps in. There are two basic steps involved in this algorithm:  Start with some random value of θ0, θ1.  Keep updating the value of θ0, θ1 to reduce J(θ0, θ1) until minimum is reached.
  • 28.
  • 29.
  • 30. Understanding Gradient Descent Consider a simpler cost function J(θ1) and the objective: Minθ1 J(θ1) Repeat{ 𝜃1 = 𝜃1 − 𝛼 𝑑 𝑑𝜃1 𝐽 𝜃1 } +ve slope -ve slope If the slope is positive and since α is a positive real number then overall term −𝜶 𝒅 𝒅𝜽𝟏 𝑱(𝜽𝟏) would be negative. This means the value of θ1 will be decreased in magnitude as shown by the arrows.
  • 31. Understanding Gradient Descent Consider a simpler cost function J(θ1) and the objective: Minθ1 J(θ1) +ve slope -ve slope if the initialization was at point B, then the slope of the line would be negative and then update term would be positive leading to increase in the value of θ1 as shown in the plot Repeat{ 𝜃1 = 𝜃1 − 𝛼 𝑑 𝑑𝜃1 𝐽 𝜃1 }
  • 32. Understanding Gradient Descent Consider a simpler cost function J(θ1) and the objective: Minθ1 J(θ1) +ve slope -ve slope Repeat{ 𝜃1 = 𝜃1 − 𝛼 𝑑 𝑑𝜃1 𝐽 𝜃1 } So, no matter where the θ1 is initialized, the algorithm ensures that parameter is updated in the right direction towards the minima, given proper value for α is chosen.
  • 33. How can we choose the learning Rate: α?
  • 34. here
  • 35. Repeat{ 𝜃1 = 𝜃1 − 𝛼 𝑑 𝑑𝜃1 𝐽 𝜃1 }
  • 36. •The yellow plot shows the divergence of the algorithm when the learning rate is really high wherein the learning steps overshoot. •The green plot shows the case where learning rate is not as large as the previous case but is high enough that the steps keep oscillating at a point which is not the minima. •The red plot would be the optimum curve for the cost drop as it drops steeply initially and then saturates very close to the optimum value. •The blue plot is the least value of αα and converges very slowly as the steps taken by the algorithm during update steps are very small. The learning rate 𝛼 vs the cost function 𝐽 𝜃1
  • 37. How to choose the learning rate? Automatic Convergence Test: Gradient descent can be considered to be converged if the drop in cost function is not more than a preset threshold say 10−3 In order to choose optimum value of α run the algorithm with different values like: 1, 0.3, 0.1, 0.03, 0.01 etc and plot the learning curve to understand whether the value should be increased or decreased. Gradient Descent Stopping condition Practical tips
  • 38. Slope1> slope2> slope3> ….. By time the slope is getting smaller, consequently the value of the step is automatically is getting smaller by time, so no need to change the value of alpha. Fixed learning rate alpha Is it required to decrease the value of 𝜶 by time?
  • 39. Applying gradient decent to the linear model here
  • 40. area price 1 2 … m area age price 1 2 … m ℎ𝜃 𝑥 = 𝜃0 + 𝜃1𝑥1 + 𝜃2𝑥2 ℎ𝜃 𝑥 = 𝜃0 + 𝜃1𝑥1 area …. #Floors price 1 2 … m ℎ𝜃 𝑥 = 𝜃0 + 𝜃1𝑥1 + ⋯ + 𝜃𝑛𝑥𝑛 Model against dataset the housing price problem
  • 41. Andrew Ng Gradient descent algorithm Linear Regression Model
  • 42. Fixed learning rate alpha Slope1> slope2> slope3> ….. By time the slope is getting smaller, consequently the value of the step is automatically is getting smaller by time, so no need to change the value of alpha.
  • 43. In order to apply gradient descent, the derivative term 𝜕 𝜕𝜃𝑗 𝐽(𝜃0, 𝜃1) needs to be calculated.
  • 44. 𝜕 𝜕𝜃𝑗 𝐽(𝜃0, 𝜃1) = 𝜕 𝜕𝜃𝑗 ( 1 2𝑚 𝑖=1 𝑚 ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 2 ) = 1 2𝑚 ( 𝜕 𝜕𝜃𝑗 𝑖=1 𝑚 ℎ𝜃(𝑥 𝑖 ) − 𝑦 𝑖 2 ) = 1 2𝑚 ( 𝜕 𝜕𝜃𝑗 𝑖=1 𝑚 𝜃0 + 𝜃1𝑥 𝑖 − 𝑦 𝑖 2 ) = 1 𝑚 𝑖=1 𝑚 (ℎ𝜃(𝑥 𝑖 ) − 𝑦 𝑖 ) for j = 0 1 𝑚 𝑖=1 𝑚 (ℎ𝜃(𝑥 𝑖 ) − 𝑦 𝑖 )𝑥 𝑖 for j = 1 In order to apply gradient descent, the derivative term 𝜕 𝜕𝜃𝑗 𝐽(𝜃0, 𝜃1) needs to be calculated.
  • 45.
  • 46. Risk of meeting different local optimum The problem of existing more than one local optima, may make the gradient decent converge to the non global optima. local optima local optima
  • 47. Cost/error function is always convex
  • 48. (for fixed , this is a function of x) (function of the parameters )
  • 49. (for fixed , this is a function of x) (function of the parameters )
  • 50. (for fixed , this is a function of x) (function of the parameters )
  • 51. (for fixed , this is a function of x) (function of the parameters )
  • 52. (for fixed , this is a function of x) (function of the parameters )
  • 53. (for fixed , this is a function of x) (function of the parameters )
  • 54. (for fixed , this is a function of x) (function of the parameters )
  • 55. (for fixed , this is a function of x) (function of the parameters )
  • 56. (for fixed , this is a function of x) (function of the parameters )
  • 57. (for fixed , this is a function of x) Now you can do prediction, given a house size!
  • 58. The gradient descent technique that uses all the training examples in each step is called Batch Gradient Descent. This is basically the calculation of the derivative term over all the training examples as it can be seen it the equation above. The equation of linear regression can also be solved using Normal Equations method, but it poses a disadvantage that it does not scale very well on larger data while gradient descent does.
  • 59. There are plenty of optimizers Examples 1. Adagrad 2. Adadelta 3. Adam 4. Conjugate Gradients 5. BFGS 6. Momentum 7. Nesterov Momentum 8. Newton’s Method 9. RMSProp 10. SGD