SlideShare uma empresa Scribd logo
1 de 79
Baixar para ler offline
INTRO TO MACHINE LEARNING
150
MIN
5.0
DMYTRO FISHMAN
UNIVERSITY OF TARTU
INSTITUTE OF COMPUTER SCIENCE
New York City Taxi
Fare Prediction
https://www.kaggle.com/c/new-york-city-taxi-fare-prediction
x
y
-0.8
0.2
-0.6
-0.4
-0.2
0.0
0.4
0.6
-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
type in your browser:
tinyurl.com/yxb5k5jl
(save a copy to your drive)
The following slides are inspired by
“An Introduction to Linear Regression Analysis” video
https://youtu.be/zPG4NjIkCjc
y
X
independent variable
dependentvariable
Linear Regression
y
X
independent variable
dependentvariable
Linear Regression
How the change in independent variable
influences dependent variable?
y
X
independent variable
dependentvariable
Positive relationship
Linear Regression
y
X
independent variable
dependentvariable
Negative relationship
Linear Regression
y
X
independent variable
dependentvariable
Linear Regression
In order to build a linear regression
we need observations
y
X
independent variable
dependentvariable
In order to build a linear regression
we need observations
Linear Regression
y
X
independent variable
dependentvariable
Linear Regression
y
X
independent variable
dependentvariable We want to find a line such that …
Linear Regression
y
X
independent variable
dependentvariable We want to find a line such that …
… it minimises the sum of errors
Linear Regression
y
X
independent variable
dependentvariable
actual
estimated
error
We want to find a line such that …
… it minimises the sum of errors
Linear Regression
y
X
independent variable
dependentvariable
arg min =
n
∑
i=1
( − )2yi ̂yi
Regression Line
Least squares method
We want to find a line such that …
… it minimises the sum of errors
Linear Regression
y
X
independent variable
dependentvariable
Linear Regression
y
X
fareamount
distance
Linear Regression
y
X
fareamount
̂y xw0 w1+=
distance
Linear Regression
y
X
fareamount
xw0 w1+=
arg min
,
=
n
∑
i=1
( − )2yi ̂yi
w0 w1
distance
̂y
Linear Regression
minimises the sum of errors with respect to w0 and w1w0 w1
y
X
fareamount
Linear Regression (example)
distance
2
3
4
5
6
1
1 2 3 4 5
x y x - x̄ y - ȳ (x - x̄ )2 (x - x̄ )(y - ȳ)
1 2 -2 -2 4 4
2 4 -1 0 1 0
3 5 0 1 0 0
4 4 1 0 1 0
5 5 2 1 4 2
x̄ = 3 ȳ = 4 10 6
xw0 w1+=̂y
w1
3w0 .6+=4 *
w0 = 2.2
2.2
=
∑ (x − x)(y − y)
∑ (x − x)2
=
6
10
= .6
=
∑ (x − x)(y − y)
∑ (x − x)2
=
6
10
= .6
y
X
fareamount
Linear Regression (example)
distance
2
3
4
5
6
1
1 2 3 4 5
x y x - x̄ y - ȳ (x - x̄ )2 (x - x̄ )(y - ȳ)
1 2 -2 -2 4 4
2 4 -1 0 1 0
3 5 0 1 0 0
4 4 1 0 1 0
5 5 2 1 4 2
x̄ = 3 ȳ = 4 10 6
xw0 w1+=̂y
w1
3w0 .6+=4 *
w0 = 2.2
2.2
Let’s return to our Colabs
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y
False
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
Root node
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
Root node
Left child Right child
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
Root node
Left child Right child
Leafs
Decision Tree Algorithm
By asking a simple question about value of independent
variable it tries to predict a value of dependent variable
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
Decision Tree Algorithm
Here, X may correspond to any vertical line.
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
For example if X = 2.5:
2.5
Decision Tree Algorithm
Here, X may correspond to any vertical line.
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
For example if X = 2.5:
2.5
What are most reasonable values
for Y and Z?
Decision Tree Algorithm
Here, X may correspond to any vertical line.
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
For example if X = 2.5:
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Decision Tree Algorithm
What would be MSE if Y = 4 and Z = 5?
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
For example if X = 2.5:
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
yi ̂yiMSE =
1
n
n
∑
i=1
( − )2
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
yi ̂yiMSE =
1
n
n
∑
i=1
( − )2
real value
predicted value
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
=
(y1 − ̂y1)2
+ (y2 − ̂y2)2
+ (y3 − ̂y3)2
+ (y4 − ̂y4)2
+ (y5 − ̂y5)2
5
yi ̂yi
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
=
(y1 − ̂y1)2
+ (y2 − ̂y2)2
+ (y3 − ̂y3)2
+ (y4 − ̂y4)2
+ (y5 − ̂y5)2
5
yi ̂yi
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
=
(2)2
+ (0)2
+ (0)2
+ (1)2
+ (0)2
5
yi ̂yi
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
=
4 + 0 + 0 + 1 + 0
5
yi ̂yi
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
=
5
5
yi ̂yi
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
= 1yi ̂yi
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
= 1yi ̂yi
1
2
3
4
5
so, if X = 2.5, Y = 4 and Z = 5, MSE is 1
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 4 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 4
Z = 5
MSE =
1
n
n
∑
i=1
( − )2
= 1yi ̂yi
1
2
3
4
5
Can we find better Y and Z?
so, if X = 2.5, Y = 4 and Z = 5, MSE is 1
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
Z = 5
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
(y1 − ̂y1)2
+ (y2 − ̂y2)2
+ (y3 − ̂y3)2
+ (y4 − ̂y4)2
+ (y5 − ̂y5)2
5
yi ̂yi
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
Z = 5
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
(2 − 3)2
+ (4 − 3)2
+ (5 − 5)2
+ (4 − 5)2
+ (5 − 5)2
5
yi ̂yi
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
Z = 5
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
1 + 1 + 0 + 1 + 0
5
yi ̂yi
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
Z = 5
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
3
5
= 0.6yi ̂yi
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3 fare amount = 5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
Z = 5
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
3
5
= 0.6yi ̂yi
so, if X = 2.5, Y = 3 and Z = 5,
MSE is 0.6
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3
fare amount =
4.5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
Z = 4.66
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
(2 − 3)2
+ (4 − 3)2
+ (5 − 4.66)2
+ (4 − 4.66)2
+ (5 − 4.66)2
5
yi ̂yi
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3
fare amount =
4.5
False True
2.5
What are most reasonable values for Y
and Z (that minimise total MSE)?
Y = 3
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
1 + 1 + 0.12 + 0.43 + 0.12
5
yi ̂yi
Z = 4.66
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3
fare amount =
4.5
False True
2.5
Y = 3
1
2
3
4
5
MSE =
1
n
n
∑
i=1
( − )2
=
2.67
5
= 0.53yi ̂yi so, if Y = 3 and Z = 4.5,
MSE is smallest
Are we happy?
Z = 4.66
Decision Tree Algorithm
Is distance > 2.5
fare amount = 3
fare amount =
4.5
False True
Hold on, how did we choose this split on the first place?
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
2.5
1
2
3
4
5
Decision Tree Algorithm
Is distance > 2.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 3
fare amount =
4.5
False True
2.5
1
2
3
4
5
Hold on, how did we choose this split on the first place?
Maybe there are better options?
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
What are the possible split options in this case?
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
What are the possible split options in this case?
0.5 1.5 2.5 3.5 4.5 5.5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
Are these meaningful?
0.5 5.5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
How to compare remaining?
1.5 2.5 3.5 4.5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
How to compare remaining?
For each one we can compute MSE
?? ? ?MSE
1.5 2.5 3.5 4.5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
How to compare remaining?
For each one we can compute MSE
0.53? ? ?
1.5 2.5 3.5 4.5
MSE
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
?
Y = 2
Z = 4.5
1.5
MSE
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
(0 + 0.25 + 0.25 + 0.25 + 0.25)/5 = 0.2
Y = 2
Z = 4.5
1.5
MSE
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
How to compare remaining?
For each one we can compute MSE
0.2 ? ?
1.5 2.5 3.5 4.5
MSE 0.53
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
?
3.5
MSE
Y = 3.66
Z = 4.5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
1.03
3.5
MSE
Y = 3.66
Z = 4.5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
How to compare remaining?
For each one we can compute MSE
0.2 1.03 ?
1.5 2.5 3.5 4.5
MSE 0.53
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
?
4.5
MSE
Y = 3.75
Z = 5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
0.95
4.5
MSE
Y = 3.75
Z = 5
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
How to compare remaining?
For each one we can compute MSE
0.2 1.03 0.95
1.5 2.5 3.5 4.5
MSE 0.53
Decision Tree Algorithm
Is distance > X
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = Y fare amount = Z
False True
1
2
3
4
5
We choose the split that minimises total MSE
0.2 1.03 0.95
1.5 2.5 3.5 4.5
MSE 0.53
Decision Tree Algorithm
Is distance > 1.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 2
fare amount =
4.5
False True
1
2
3
4
5
Thus, the resulting tree:
0.2
1.5
MSE
Decision Tree Algorithm
Is distance > 1.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
fare amount = 2
fare amount =
4.5
False True
1
2
3
4
5
Can we make our decision tree more accurate?
0.2
1.5
MSE
Decision Tree Algorithm
distance > 1.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
False True
1
2
3
4
5
Can we make our decision tree more accurate?
0.2
1.5
MSE
Yes, by going deeper!
fare amount =
2
distance > X
fare amount =
Y
fare amount =
Z
False True
Decision Tree Algorithm
distance > 1.5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
False True
1
2
3
4
5
Can we make our decision tree more accurate?
0.2
1.5
MSE
Yes, by going deeper!
fare amount =
2
distance > X
fare amount =
Y
fare amount =
Z
False True
Let’s return to our Colabs
Overfitting
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
y
X
fareamount
distance
2
3
4
5
6
1
1 2 3 4 5
Simple, but imperfect Complicated, but ideal
VS
Train/val split
Initial dataset
MSE = 1.0
Train dataset
Randomly
select 60%
MSE = 0.0
Simple, but
imperfect
Complicated,
but ideal
Validation (val) dataset
Randomly
select 40%
MSE = 2.5 MSE = 0.5
POINTS
POINTS
1. MACHINE LEARNING
MODEL IS NOT MAGIC

2. YOU CAN SAVE AND
LOAD ML MODELS

3. EVALUATING MODEL
PERFORMANCE IS
IMPORTANT

4. YOU MAY NEED TO
RETRAIN YOUR
MODELS
THANK YOU

Mais conteúdo relacionado

Mais procurados

2014 st josephs geelong spec maths
2014 st josephs geelong spec maths2014 st josephs geelong spec maths
2014 st josephs geelong spec mathsAndrew Smith
 
LLP and Transportation problems solution
LLP and Transportation problems solution LLP and Transportation problems solution
LLP and Transportation problems solution Aditya Arora
 
Approach to anova questions
Approach to anova questionsApproach to anova questions
Approach to anova questionsGeorgeGidudu
 
Solution Manual : Chapter - 01 Functions
Solution Manual : Chapter - 01 FunctionsSolution Manual : Chapter - 01 Functions
Solution Manual : Chapter - 01 FunctionsHareem Aslam
 
ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-
ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-
ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-ssusere0a682
 
ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-
ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-
ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-ssusere0a682
 
Biostatistics Standard deviation and variance
Biostatistics Standard deviation and varianceBiostatistics Standard deviation and variance
Biostatistics Standard deviation and varianceHARINATHA REDDY ASWARTHA
 
resposta do capitulo 15
resposta do capitulo 15resposta do capitulo 15
resposta do capitulo 15silvio_sas
 
RS Agarwal Quantitative Aptitude - 10 chap
RS Agarwal Quantitative Aptitude - 10 chapRS Agarwal Quantitative Aptitude - 10 chap
RS Agarwal Quantitative Aptitude - 10 chapVinoth Kumar.K
 
ゲーム理論BASIC 演習6 -仁を求める-
ゲーム理論BASIC 演習6 -仁を求める-ゲーム理論BASIC 演習6 -仁を求める-
ゲーム理論BASIC 演習6 -仁を求める-ssusere0a682
 
The sexagesimal foundation of mathematics
The sexagesimal foundation of mathematicsThe sexagesimal foundation of mathematics
The sexagesimal foundation of mathematicsMichielKarskens
 

Mais procurados (15)

Tugas blog-matematika
Tugas blog-matematikaTugas blog-matematika
Tugas blog-matematika
 
Appendex
AppendexAppendex
Appendex
 
2014 st josephs geelong spec maths
2014 st josephs geelong spec maths2014 st josephs geelong spec maths
2014 st josephs geelong spec maths
 
LLP and Transportation problems solution
LLP and Transportation problems solution LLP and Transportation problems solution
LLP and Transportation problems solution
 
Approach to anova questions
Approach to anova questionsApproach to anova questions
Approach to anova questions
 
Solution Manual : Chapter - 01 Functions
Solution Manual : Chapter - 01 FunctionsSolution Manual : Chapter - 01 Functions
Solution Manual : Chapter - 01 Functions
 
ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-
ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-
ゲーム理論NEXT 線形計画問題第7回 -シンプレックス法2-
 
ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-
ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-
ゲーム理論BASIC 演習32 -時間決めゲーム:交渉ゲーム-
 
Biostatistics Standard deviation and variance
Biostatistics Standard deviation and varianceBiostatistics Standard deviation and variance
Biostatistics Standard deviation and variance
 
resposta do capitulo 15
resposta do capitulo 15resposta do capitulo 15
resposta do capitulo 15
 
RS Agarwal Quantitative Aptitude - 10 chap
RS Agarwal Quantitative Aptitude - 10 chapRS Agarwal Quantitative Aptitude - 10 chap
RS Agarwal Quantitative Aptitude - 10 chap
 
Inequalities
InequalitiesInequalities
Inequalities
 
Tugas 5.3 kalkulus integral
Tugas 5.3 kalkulus integralTugas 5.3 kalkulus integral
Tugas 5.3 kalkulus integral
 
ゲーム理論BASIC 演習6 -仁を求める-
ゲーム理論BASIC 演習6 -仁を求める-ゲーム理論BASIC 演習6 -仁を求める-
ゲーム理論BASIC 演習6 -仁を求める-
 
The sexagesimal foundation of mathematics
The sexagesimal foundation of mathematicsThe sexagesimal foundation of mathematics
The sexagesimal foundation of mathematics
 

Semelhante a Introduction to Machine Learning for Taxify/Bolt

Lecture 7.1 to 7.2 bt
Lecture 7.1 to 7.2 btLecture 7.1 to 7.2 bt
Lecture 7.1 to 7.2 btbtmathematics
 
Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...
Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...
Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...Nish Kala Devi
 
Malimu variance and standard deviation
Malimu variance and standard deviationMalimu variance and standard deviation
Malimu variance and standard deviationMiharbi Ignasm
 
2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptx2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptxsaadhaq6
 
Practice questions and tips in business mathematics
Practice questions and tips in business mathematicsPractice questions and tips in business mathematics
Practice questions and tips in business mathematicsDr. Trilok Kumar Jain
 
Practice questions and tips in business mathematics
Practice questions and tips in business mathematicsPractice questions and tips in business mathematics
Practice questions and tips in business mathematicsDr. Trilok Kumar Jain
 
random variable and distribution
random variable and distributionrandom variable and distribution
random variable and distributionlovemucheca
 
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdfgroup4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdfPedhaBabu
 
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdfgroup4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdfAliceRivera13
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statisticsshekharpatil33
 
Chapter-1-04032021-111422pm (2).pptx
Chapter-1-04032021-111422pm (2).pptxChapter-1-04032021-111422pm (2).pptx
Chapter-1-04032021-111422pm (2).pptxabdulhannan992458
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithmsbigdata trunk
 

Semelhante a Introduction to Machine Learning for Taxify/Bolt (20)

Lecture 7.1 to 7.2 bt
Lecture 7.1 to 7.2 btLecture 7.1 to 7.2 bt
Lecture 7.1 to 7.2 bt
 
Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...
Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...
Dirty quant-shortcut-workshop-handout-inequalities-functions-graphs-coordinat...
 
Malimu variance and standard deviation
Malimu variance and standard deviationMalimu variance and standard deviation
Malimu variance and standard deviation
 
2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptx2. Fixed Point Iteration.pptx
2. Fixed Point Iteration.pptx
 
Numerical Method for UOG mech stu prd by Abdrehman Ahmed
Numerical Method for UOG mech stu prd by Abdrehman Ahmed Numerical Method for UOG mech stu prd by Abdrehman Ahmed
Numerical Method for UOG mech stu prd by Abdrehman Ahmed
 
Basic algebra for entrepreneurs
Basic algebra for entrepreneurs Basic algebra for entrepreneurs
Basic algebra for entrepreneurs
 
Basic algebra for entrepreneurs
Basic algebra for entrepreneurs Basic algebra for entrepreneurs
Basic algebra for entrepreneurs
 
Practice questions and tips in business mathematics
Practice questions and tips in business mathematicsPractice questions and tips in business mathematics
Practice questions and tips in business mathematics
 
Practice questions and tips in business mathematics
Practice questions and tips in business mathematicsPractice questions and tips in business mathematics
Practice questions and tips in business mathematics
 
PPT SPLTV
PPT SPLTVPPT SPLTV
PPT SPLTV
 
Math quiz general
Math quiz generalMath quiz general
Math quiz general
 
Chapter 04 answers
Chapter 04 answersChapter 04 answers
Chapter 04 answers
 
random variable and distribution
random variable and distributionrandom variable and distribution
random variable and distribution
 
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdfgroup4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891 (1).pdf
 
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdfgroup4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
group4-randomvariableanddistribution-151014015655-lva1-app6891.pdf
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
Chapter-1-04032021-111422pm (2).pptx
Chapter-1-04032021-111422pm (2).pptxChapter-1-04032021-111422pm (2).pptx
Chapter-1-04032021-111422pm (2).pptx
 
04_AJMS_299_21.pdf
04_AJMS_299_21.pdf04_AJMS_299_21.pdf
04_AJMS_299_21.pdf
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithms
 
2LinearSequences
2LinearSequences2LinearSequences
2LinearSequences
 

Mais de Dmytro Fishman

DOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDmytro Fishman
 
Tips for effective presentations
Tips for effective presentationsTips for effective presentations
Tips for effective presentationsDmytro Fishman
 
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPAutonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPDmytro Fishman
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningDmytro Fishman
 
Introduction to Gaussian Processes
Introduction to Gaussian ProcessesIntroduction to Gaussian Processes
Introduction to Gaussian ProcessesDmytro Fishman
 
Detecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDetecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDmytro Fishman
 
Deep Learning in Healthcare
Deep Learning in HealthcareDeep Learning in Healthcare
Deep Learning in HealthcareDmytro Fishman
 
5 Introduction to neural networks
5 Introduction to neural networks5 Introduction to neural networks
5 Introduction to neural networksDmytro Fishman
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)Dmytro Fishman
 
3 Unsupervised learning
3 Unsupervised learning3 Unsupervised learning
3 Unsupervised learningDmytro Fishman
 
What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?Dmytro Fishman
 
Machine Learning in Bioinformatics
Machine Learning in BioinformaticsMachine Learning in Bioinformatics
Machine Learning in BioinformaticsDmytro Fishman
 

Mais de Dmytro Fishman (14)

DOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biology
 
Tips for effective presentations
Tips for effective presentationsTips for effective presentations
Tips for effective presentations
 
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPAutonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Introduction to Gaussian Processes
Introduction to Gaussian ProcessesIntroduction to Gaussian Processes
Introduction to Gaussian Processes
 
Biit group 2018
Biit group 2018Biit group 2018
Biit group 2018
 
Detecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDetecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep Learning
 
Deep Learning in Healthcare
Deep Learning in HealthcareDeep Learning in Healthcare
Deep Learning in Healthcare
 
5 Introduction to neural networks
5 Introduction to neural networks5 Introduction to neural networks
5 Introduction to neural networks
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)
 
3 Unsupervised learning
3 Unsupervised learning3 Unsupervised learning
3 Unsupervised learning
 
1 Supervised learning
1 Supervised learning1 Supervised learning
1 Supervised learning
 
What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?
 
Machine Learning in Bioinformatics
Machine Learning in BioinformaticsMachine Learning in Bioinformatics
Machine Learning in Bioinformatics
 

Último

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Último (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

Introduction to Machine Learning for Taxify/Bolt

  • 1. INTRO TO MACHINE LEARNING 150 MIN 5.0 DMYTRO FISHMAN UNIVERSITY OF TARTU INSTITUTE OF COMPUTER SCIENCE
  • 2. New York City Taxi Fare Prediction https://www.kaggle.com/c/new-york-city-taxi-fare-prediction
  • 3. x y -0.8 0.2 -0.6 -0.4 -0.2 0.0 0.4 0.6 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 type in your browser: tinyurl.com/yxb5k5jl (save a copy to your drive)
  • 4. The following slides are inspired by “An Introduction to Linear Regression Analysis” video https://youtu.be/zPG4NjIkCjc
  • 6. y X independent variable dependentvariable Linear Regression How the change in independent variable influences dependent variable?
  • 9. y X independent variable dependentvariable Linear Regression In order to build a linear regression we need observations
  • 10. y X independent variable dependentvariable In order to build a linear regression we need observations Linear Regression
  • 12. y X independent variable dependentvariable We want to find a line such that … Linear Regression
  • 13. y X independent variable dependentvariable We want to find a line such that … … it minimises the sum of errors Linear Regression
  • 14. y X independent variable dependentvariable actual estimated error We want to find a line such that … … it minimises the sum of errors Linear Regression
  • 15. y X independent variable dependentvariable arg min = n ∑ i=1 ( − )2yi ̂yi Regression Line Least squares method We want to find a line such that … … it minimises the sum of errors Linear Regression
  • 19. y X fareamount xw0 w1+= arg min , = n ∑ i=1 ( − )2yi ̂yi w0 w1 distance ̂y Linear Regression minimises the sum of errors with respect to w0 and w1w0 w1
  • 20. y X fareamount Linear Regression (example) distance 2 3 4 5 6 1 1 2 3 4 5 x y x - x̄ y - ȳ (x - x̄ )2 (x - x̄ )(y - ȳ) 1 2 -2 -2 4 4 2 4 -1 0 1 0 3 5 0 1 0 0 4 4 1 0 1 0 5 5 2 1 4 2 x̄ = 3 ȳ = 4 10 6 xw0 w1+=̂y w1 3w0 .6+=4 * w0 = 2.2 2.2 = ∑ (x − x)(y − y) ∑ (x − x)2 = 6 10 = .6
  • 21. = ∑ (x − x)(y − y) ∑ (x − x)2 = 6 10 = .6 y X fareamount Linear Regression (example) distance 2 3 4 5 6 1 1 2 3 4 5 x y x - x̄ y - ȳ (x - x̄ )2 (x - x̄ )(y - ȳ) 1 2 -2 -2 4 4 2 4 -1 0 1 0 3 5 0 1 0 0 4 4 1 0 1 0 5 5 2 1 4 2 x̄ = 3 ȳ = 4 10 6 xw0 w1+=̂y w1 3w0 .6+=4 * w0 = 2.2 2.2 Let’s return to our Colabs
  • 22. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable
  • 23. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5
  • 24. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5
  • 25. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y False
  • 26. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True
  • 27. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True Root node
  • 28. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True Root node Left child Right child
  • 29. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True Root node Left child Right child Leafs
  • 30. Decision Tree Algorithm By asking a simple question about value of independent variable it tries to predict a value of dependent variable Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True
  • 31. Decision Tree Algorithm Here, X may correspond to any vertical line. Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True For example if X = 2.5: 2.5
  • 32. Decision Tree Algorithm Here, X may correspond to any vertical line. Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True For example if X = 2.5: 2.5 What are most reasonable values for Y and Z?
  • 33. Decision Tree Algorithm Here, X may correspond to any vertical line. Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True For example if X = 2.5: 2.5 What are most reasonable values for Y and Z (that minimise total MSE)?
  • 34. Decision Tree Algorithm What would be MSE if Y = 4 and Z = 5? Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True For example if X = 2.5: 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5
  • 35. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 yi ̂yiMSE = 1 n n ∑ i=1 ( − )2
  • 36. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 yi ̂yiMSE = 1 n n ∑ i=1 ( − )2 real value predicted value
  • 37. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = (y1 − ̂y1)2 + (y2 − ̂y2)2 + (y3 − ̂y3)2 + (y4 − ̂y4)2 + (y5 − ̂y5)2 5 yi ̂yi 1 2 3 4 5
  • 38. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = (y1 − ̂y1)2 + (y2 − ̂y2)2 + (y3 − ̂y3)2 + (y4 − ̂y4)2 + (y5 − ̂y5)2 5 yi ̂yi 1 2 3 4 5
  • 39. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = (2)2 + (0)2 + (0)2 + (1)2 + (0)2 5 yi ̂yi 1 2 3 4 5
  • 40. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = 4 + 0 + 0 + 1 + 0 5 yi ̂yi 1 2 3 4 5
  • 41. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = 5 5 yi ̂yi 1 2 3 4 5
  • 42. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = 1yi ̂yi 1 2 3 4 5
  • 43. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = 1yi ̂yi 1 2 3 4 5 so, if X = 2.5, Y = 4 and Z = 5, MSE is 1
  • 44. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 4 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 4 Z = 5 MSE = 1 n n ∑ i=1 ( − )2 = 1yi ̂yi 1 2 3 4 5 Can we find better Y and Z? so, if X = 2.5, Y = 4 and Z = 5, MSE is 1
  • 45. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 Z = 5 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = (y1 − ̂y1)2 + (y2 − ̂y2)2 + (y3 − ̂y3)2 + (y4 − ̂y4)2 + (y5 − ̂y5)2 5 yi ̂yi
  • 46. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 Z = 5 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = (2 − 3)2 + (4 − 3)2 + (5 − 5)2 + (4 − 5)2 + (5 − 5)2 5 yi ̂yi
  • 47. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 Z = 5 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = 1 + 1 + 0 + 1 + 0 5 yi ̂yi
  • 48. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 Z = 5 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = 3 5 = 0.6yi ̂yi
  • 49. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 Z = 5 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = 3 5 = 0.6yi ̂yi so, if X = 2.5, Y = 3 and Z = 5, MSE is 0.6
  • 50. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 4.5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 Z = 4.66 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = (2 − 3)2 + (4 − 3)2 + (5 − 4.66)2 + (4 − 4.66)2 + (5 − 4.66)2 5 yi ̂yi
  • 51. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 4.5 False True 2.5 What are most reasonable values for Y and Z (that minimise total MSE)? Y = 3 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = 1 + 1 + 0.12 + 0.43 + 0.12 5 yi ̂yi Z = 4.66
  • 52. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 4.5 False True 2.5 Y = 3 1 2 3 4 5 MSE = 1 n n ∑ i=1 ( − )2 = 2.67 5 = 0.53yi ̂yi so, if Y = 3 and Z = 4.5, MSE is smallest Are we happy? Z = 4.66
  • 53. Decision Tree Algorithm Is distance > 2.5 fare amount = 3 fare amount = 4.5 False True Hold on, how did we choose this split on the first place? y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 2.5 1 2 3 4 5
  • 54. Decision Tree Algorithm Is distance > 2.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 3 fare amount = 4.5 False True 2.5 1 2 3 4 5 Hold on, how did we choose this split on the first place? Maybe there are better options?
  • 55. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 What are the possible split options in this case?
  • 56. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 What are the possible split options in this case? 0.5 1.5 2.5 3.5 4.5 5.5
  • 57. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 Are these meaningful? 0.5 5.5
  • 58. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 How to compare remaining? 1.5 2.5 3.5 4.5
  • 59. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 How to compare remaining? For each one we can compute MSE ?? ? ?MSE 1.5 2.5 3.5 4.5
  • 60. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 How to compare remaining? For each one we can compute MSE 0.53? ? ? 1.5 2.5 3.5 4.5 MSE
  • 61. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 ? Y = 2 Z = 4.5 1.5 MSE
  • 62. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 (0 + 0.25 + 0.25 + 0.25 + 0.25)/5 = 0.2 Y = 2 Z = 4.5 1.5 MSE
  • 63. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 How to compare remaining? For each one we can compute MSE 0.2 ? ? 1.5 2.5 3.5 4.5 MSE 0.53
  • 64. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 ? 3.5 MSE Y = 3.66 Z = 4.5
  • 65. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 1.03 3.5 MSE Y = 3.66 Z = 4.5
  • 66. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 How to compare remaining? For each one we can compute MSE 0.2 1.03 ? 1.5 2.5 3.5 4.5 MSE 0.53
  • 67. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 ? 4.5 MSE Y = 3.75 Z = 5
  • 68. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 0.95 4.5 MSE Y = 3.75 Z = 5
  • 69. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 How to compare remaining? For each one we can compute MSE 0.2 1.03 0.95 1.5 2.5 3.5 4.5 MSE 0.53
  • 70. Decision Tree Algorithm Is distance > X y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = Y fare amount = Z False True 1 2 3 4 5 We choose the split that minimises total MSE 0.2 1.03 0.95 1.5 2.5 3.5 4.5 MSE 0.53
  • 71. Decision Tree Algorithm Is distance > 1.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 2 fare amount = 4.5 False True 1 2 3 4 5 Thus, the resulting tree: 0.2 1.5 MSE
  • 72. Decision Tree Algorithm Is distance > 1.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 fare amount = 2 fare amount = 4.5 False True 1 2 3 4 5 Can we make our decision tree more accurate? 0.2 1.5 MSE
  • 73. Decision Tree Algorithm distance > 1.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 False True 1 2 3 4 5 Can we make our decision tree more accurate? 0.2 1.5 MSE Yes, by going deeper! fare amount = 2 distance > X fare amount = Y fare amount = Z False True
  • 74. Decision Tree Algorithm distance > 1.5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 False True 1 2 3 4 5 Can we make our decision tree more accurate? 0.2 1.5 MSE Yes, by going deeper! fare amount = 2 distance > X fare amount = Y fare amount = Z False True Let’s return to our Colabs
  • 75. Overfitting y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 y X fareamount distance 2 3 4 5 6 1 1 2 3 4 5 Simple, but imperfect Complicated, but ideal VS
  • 76. Train/val split Initial dataset MSE = 1.0 Train dataset Randomly select 60% MSE = 0.0 Simple, but imperfect Complicated, but ideal Validation (val) dataset Randomly select 40% MSE = 2.5 MSE = 0.5
  • 78. POINTS 1. MACHINE LEARNING MODEL IS NOT MAGIC 2. YOU CAN SAVE AND LOAD ML MODELS 3. EVALUATING MODEL PERFORMANCE IS IMPORTANT 4. YOU MAY NEED TO RETRAIN YOUR MODELS