SlideShare uma empresa Scribd logo
1 de 291
Baixar para ler offline
Introduction to Machine
Learning
(Supervised learning)
Dmytro Fishman (dmytro@ut.ee)
This is an introduction to the topic
This is an introduction to the topic
We will try to provide a beautiful scenery
“We love you, Mummy!”
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
“We love you, Mummy!”
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
“We love you, Mummy!”
Petal
Sepal
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
“We love you, Mummy!”
Petal
Sepal
Word 1 25
Word 2 23
Word 3 12
… …
Petal
Sepal
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
Word 1 25
Word 2 23
Word 3 12
… …
“We love you, Mummy!”
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
Big DataBig Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
Big Data
Astronomical?
Youtubical?
Big Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
Genomical?
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
Big Data
Astronomical? Genomical?
Youtubical?
1 Exabyte/year 2-40 Exabyte/year
1-2 Exabyte/year
Big Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
Big Data
Astronomical? Genomical?
Youtubical?
1 Exabyte/year 2-40 Exabyte/year
1-2 Exabyte/year
Big Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
1 Exabyte =1012 Mb
There is a lot of data
produced nowadays
But there are also a vast number of potential ways to use this
data
Supervised Learning
Benign Malignant
Skin cancer example
Malignant?
Tumour size
Benign Malignant
Skin cancer example
Yes(1)
No(0)
Malignant?
Tumour size
Benign Malignant
Skin cancer example
Yes(1)
No(0)
Malignant?
Tumour size
Benign Malignant
Skin cancer example
Yes(1)
No(0)
Malignant?
Tumour size
Yes(1)
Benign Malignant
Skin cancer example
No(0)
Malignant?
Tumour size
Yes(1)
No(0)
Benign Malignant
Skin cancer example
Malignant?
Tumour size
Age
Tumour size
Age
Malignant?
Tumour size
Age
Malignant?
Tumour size
Age
Malignant?
Other features:
Lesion type
Lesion configuration
Texture
Location
Distribution
…
Potentially infinitely many features!
Classification task
Predicting discrete value output using previously labeled
examples
Binary classification
Classification task
Predicting discrete value output using previously labeled
examples
also binary classification
Classification task
Predicting discrete value output using previously labeled
examples
also binary classification
Every time you have to distinguish between TWO
CLASSES it is a binary classification
Classification task
Multiclass classification
Predicting discrete value output using previously labeled
examples
Housing price prediction
Housing price prediction
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Housing price prediction
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Housing price prediction
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Price?
Housing price prediction
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Price?
Housing price prediction
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Price
Housing price prediction
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Price
Regression task
Size in m2
Price in
1000’s ($)
400
100
200
300
100 200 300 400 500
Price
Malignant?
Tumour size
Yes(1)
No(0)
Benign Malignant
Malignant?
VS
Classification Regression
Supervised Learning
Size in m2
Pricein1000’s($)
400
100
200
300
100 200 300 400 500
Price?
You are running a company which has two problems,
namely:

Q:
a. Both problems are examples of classification problems
b. The first one is a classification task and the second one
regression problem
c. The first one is regression problem and the second one as
classification task
d. Both problems are regression problems
1. For each user in the database predict if this user will
continue using your company’s product or will move to
competitors (churn).
2. Predict the profit of your company at the end of this year
based on previous records.
How would you approach these problems?
You are running a company which has two problems,
namely:

Q:
a. Both problems are examples of classification problems
b. The first one is a classification task and the second one
regression problem
c. The first one is regression problem and the second one as
classification task
d. Both problems are regression problems
1. For each user in the database predict if this user will
continue using your company’s product or will move to
competitors (churn).
2. Predict the profit of your company at the end of this year
based on previous records.
How would you approach these problems?
Unsupervised Learning
examples slides
Clustering with google queries
On a contrary to the first category we don’t have labels to our
classes (graphs with two features from previous examples
turns into unlabelled one)
Gene expression clustering
Quiz question: “of the following examples, which would you
address using a n unsupervised learning algorithm?”
Tumour size
Age
Supervised Learning
Tumour size
Age
Unsupervised Learning
Tumour size
Age
Unsupervised Learning
Is there any interesting hidden structure in this data?
Tumour size
Age
Unsupervised Learning
Is there any interesting hidden structure in this data?
What does this hidden structure correspond to?
Gene expression
Gene expression
Two interesting
groups of
species
Gene expression
Two interesting
groups of genes
Q1: Some telecommunication company wants to segment
their customers into distinct groups in order to send
appropriate subscription offers, this is an example of ...
Q2: You are given data about seismic activity in Japan, and
you want to predict a magnitude of the next earthquake,
this is in an example of ...
Q3: Assume you want to perform supervised learning and
to predict number of newborns according to size of storks'
population (http://www.brixtonhealth.com/storksBabies.pdf),
it is an example of ...
Q4: Discriminating between spam and ham e-mails is a
classification task, true or false?
Q1: Some telecommunication company wants to segment
their customers into distinct groups in order to send
appropriate subscription offers, this is an example of
clustering
Q2: You are given data about seismic activity in Japan, and
you want to predict a magnitude of the next earthquake,
this is in an example of ...
Quiz: Assume you want to perform supervised learning and
to predict number of newborns according to size of storks'
population (http://www.brixtonhealth.com/storksBabies.pdf),
it is an example of ...
Quiz: Discriminating between spam and ham e-mails is a
classification task, true or false?
Q1: Some telecommunication company wants to segment
their customers into distinct groups in order to send
appropriate subscription offers, this is an example of
clustering
Q2: You are given data about seismic activity in Japan, and
you want to predict a magnitude of the next earthquake,
this is in an example of regression
Q3: Assume you want to perform supervised learning and
to predict number of newborns according to size of storks'
population (http://www.brixtonhealth.com/storksBabies.pdf),
it is an example of ...
Quiz: Discriminating between spam and ham e-mails is a
classification task, true or false?
Q3: Assume you want to perform supervised learning and
to predict number of newborns according to size of storks'
population (http://www.brixtonhealth.com/storksBabies.pdf),
it is an example of stupidity regression
Q4: Discriminating between spam and ham e-mails is a
classification task, true or false?
Q2: You are given data about seismic activity in Japan, and
you want to predict a magnitude of the next earthquake,
this is in an example of regression
Q1: Some telecommunication company wants to segment
their customers into distinct groups in order to send
appropriate subscription offers, this is an example of
clustering
Q4: Discriminating between spam and ham e-mails is a
classification task.
Q3: Assume you want to perform supervised learning and
to predict number of newborns according to size of storks'
population (http://www.brixtonhealth.com/storksBabies.pdf),
it is an example of stupidity regression
Q2: You are given data about seismic activity in Japan, and
you want to predict a magnitude of the next earthquake,
this is in an example of regression
Q1: Some telecommunication company wants to segment
their customers into distinct groups in order to send
appropriate subscription offers, this is an example of
clustering
MNIST dataset
(10000 images)
Instance Label
28px
28px
3
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
3
MNIST dataset
(10000 images)
In total 784 pixel values
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
3
Pixel values Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Pixel values Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Feature
Pixel values Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Feature
Pixel values Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Feature
Pixel values Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel valuesFeatures are also some times
referred to as dimensions
Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel valuesFeatures are also some times
referred to as dimensions
This images are 784 dimensional
Pixel values Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Data is loaded.
What should we do now?
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Data is loaded.
What should we do now?
We would like to build a tool that would
be able to automatically recognise
handwritten images
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1
Instances
MNIST dataset
(10000 images)
Instance Label0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 155 255 255 255 155 0
255 255 255 255 255 255 255
255 155 78 78 155 255 255
255 0 0 0 0 155 255
28px
28px
(0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …)
could be downloaded from: http://yann.lecun.com/exdb/mnist/
3
In total 784 pixel values
Data is loaded.
What should we do now?
We would like to build a tool that would
be able to automatically recognise
handwritten images
Let’s get to the first algorithm
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
What about computing their pixel-wise difference?
OR
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
Σ
784
|Ai - Ci|Σ
784
|Ai - Bi|i i
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
Σ
784
|Ai - Ci|Σ
784
|Ai - Bi|i i
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
Σ
784
|Ai - Ci|Σ
784
|Ai - Bi|i i
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
Σ
784
|Ai - Ci| = 107.38Σ
784
|Ai - Bi| = 137.03i i
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
Σ
784
|Ai - Ci| = 107.38Σ
784
|Ai - Bi| = 137.03i i
A is more similar to C than B
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
OR
Σ
784
|Ai - Ci| = 107.38Σ
784
|Ai - Bi| = 137.03i i
A is more similar (closer) to C than B
Σ
784
|Ai - Ci| = 107.38Σ
784
|Ai - Bi| = 137.03
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
i i
OR
A is more similar (closer) to C than B
Σ
784
|Ai - Ci| = 107.38Σ
784
|Ai - Bi| = 137.03
How to quantitatively say which of
these pairs are more similar?
& &
A B CA
i i
OR
A is more similar (closer) to C than B
Instance Label
?
DatasetFor each new instance
We asked our friend to
write a bunch of new
digits so that we can
have something to
recognise, here is the
first one of them
Instance Label
?
DatasetFor each new instance
Instance Label
?
1.Compute pixel-wise
distance to all training
examples
For each new instance Dataset
Instance Label
?
1.Compute pixel-wise
distance to all training
examples
For each new instance Dataset
Instance Label
?
1.Compute pixel-wise
distance to all training
examples
For each new instance Dataset
Instance Label
?
1.Compute pixel-wise
distance to all training
examples
For each new instance Dataset
Instance Label
?
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
For each new instance Dataset
Instance Label
?
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
For each new instance Dataset
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Nearest Neighbour
classifier
For each new instance Dataset
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
For each new instance
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Very easy to implement
For each new instance
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Very easy to implement
Very slow classification time
For each new instance
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Very easy to implement
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
For each new instance
Very slow classification time
Curse of dimensionality
Remember we said
that our instances are
784 dimensional?
Curse of dimensionality
Remember we said
that our instances are
784 dimensional?
This is a lot!
http://cs231n.github.io/classification/
http://cs231n.github.io/classification/
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Very easy to implement
Very slow classification time
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
For each new instance
For each test example
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Fast training time O(C)
Very easy to implement
Very slow classification time
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
NN is rarely used in
practice
For each test example
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Fast training time O(C)
Very easy to implement
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
Can we find a better
algorithm?
Very slow classification time
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification
*for a sec
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification
*for a sec
pixel #213
pixel #213
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification
*for a sec
pixel #213
> 163 <= 163
pixel #213
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification
*for a sec
pixel #213
> 163 <= 163
pixel #216
pixel #216
> 30 <= 30
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
pixel #216
VS
Back to binary classification
*for a sec
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree
Instances
pixel #216
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
pixel #216
> 30 <= 30
VS
Back to binary classification
*for a sec
pixel #213
> 163 <= 163
Split
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification pixel #213
> 163 <= 163
pixel #216
pixel #216
> 30 <= 30
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification
*for a sec
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
How do you know which
features to use for best splits?
Split
VS
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Back to binary classification
*for a sec
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
How do you know which
features to use for best splits?
Split
Using various goodness metrics
such as information gain or gini
impurity to define “best”
Decision (classification) tree algorithm
1.Construct a decision
tree based on training
examples
Decision (classification) tree algorithm
1.Construct a decision
tree based on training
examples
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision (classification) tree algorithm
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Instance Label
?
2.Make corresponding
comparisons
1.Construct a decision
tree based on training
examples
#213
For each new instance
Decision (classification) tree algorithm
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Instance Label
?
2.Make corresponding
comparisons
1.Construct a decision
tree based on training
examples
#213
#216
For each new instance
Decision (classification) tree algorithm
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Instance Label
6
3. Report label
1.Construct a decision
tree based on training
examples
2.Make corresponding
comparisons
#213
#216
For each new instance
Decision (classification) tree algorithm
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Instance Label
6 Depth=2
Once the tree is constructed
maximum 2 comparisons
would be needed to test a new
example3. Report label
1.Construct a decision
tree based on training
examples
2.Make corresponding
comparisons
#213
#216
For each new instance
Decision (classification) tree algorithm
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Instance Label
6 Depth=2
In general decision trees are
*always faster than NN
algorithm3. Report label
1.Construct a decision
tree based on training
examples
2.Make corresponding
comparisons
*remember, shit happens
#213
#216
For each new instance
Can we find a better
algorithm?
Disadvantages of NN
Very slow classification time
Suffers from
the curse of dimensionality
Can we find a better
algorithm?
Disadvantages of NNDisadvantages of DT
Very slow classification time Very slow classification time
Suffers from
the curse of dimensionality
Can we find a better
algorithm?
Disadvantages of NNDisadvantages of DT
Also suffers from
the curse of dimensionality
Very slow classification time Very slow classification time
Suffers from
the curse of dimensionality
Is there a way to break the curse?
Is there a way to break the curse?
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree algorithm is non-parametric and
deterministic
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree algorithm is non-parametric and
deterministic
The shape of the
tree is determined
by data not our
choice
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree algorithm is non-parametric and
deterministic
This means that we will always have the same
output given the same input…
The shape of the
tree is determined
by data not our
choice
The shape of the
tree is determined
by data not our
choice
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree algorithm is non-parametric and
deterministic
This means that we will always have the same
output given the same input…
Are all input dimensions equally important
for classification?
The shape of the
tree is determined
by data not our
choice
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree algorithm is non-parametric and
deterministic
This means that we will always have the same
output given the same input…
How about building a lot of trees from
random parts of the data and then merging
their predictions?
Are all input dimensions equally important
for classification?
The shape of the
tree is determined
by data not our
choice
pixel #213
> 163 <= 163
pixel #216
> 30 <= 30
Decision tree algorithm is non-parametric and
deterministic
This means that we will always have the same
output given the same input…
How about building a lot of trees from
random parts of the data and then merging
their predictions?
Are all input dimensions equally important
for classification?
Random forest algorithm
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
Randomly discard some rows
Randomly discard some rows and columns
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
Build a decision tree
based on remaining data
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
Build a decision tree
based on remaining data
Repeat N times until N trees
are constructed
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Instance Label
?
For each new instance
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Instance Label
?
For each new instance Use all constructed trees to
generate predictions
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Instance Label
?
For each new instance Predictions
Tree #2
Tree #1
Tree #3
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Instance Label
For each new instance Predictions
Tree #2
Tree #1
Tree #3?
Average 2/3
Random forest algorithm
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Instance Label
For each new instance Predictions
Tree #2
Tree #1
Tree #36
Average 2/3 = 66.6%
Random forest algorithm
Instance Label
6
For each new instance Predictions
Tree #2
Tree #1
Tree #3
Average 2/3 = 66.6%
Quiz time
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
pixel #213
> 163 <= 163
pixel #214
> 253 <= 253
pixel #216
> 30 <= 30
Q1:
Which classification algorithm(s) has(ve) the following
weaknesses:
• It takes more time to train the classifier then to
classify a new instance
• It suffers from the curse of dimensionality
A. Nearest neighbour algorithm
B. Decision tree
C. Random forest algorithm
D. None of the above
E. All of the above
Q1:
A. Nearest neighbour algorithm
B. Decision tree
C. Random forest algorithm
D. None of the above
E. All of the above
• It takes more time to train the classifier then to
classify a new example
• It suffers from the curse of dimensionality
pixel #213
> 163 <= 163
pixel #216
> 0 = 0
Which classification algorithm(s) has(ve) the following
weaknesses:
Q2:
A. Prohibitively slow running time at
training given a lot of data
B. Highly biased classification due
prevalence of one of the classes
C. High classification error due to
excessively complex classifier
D. Poor performance of the classifier
trained on data with large number of
features
E. None of the above
Which of the following statements best defines the curse of
dimensionality
Q2:
Which of the following statements best defines the curse of
dimensionality
A. Prohibitively slow running time at
training given a lot of data
B. Highly biased classification due
prevalence of one of the classes
C. High classification error due to
excessively complex classifier
D. Poor performance of the classifier
trained on data with large number
of features
E. None of the above
Q3:
Which of the following algorithms you would prefer If you
would have to classify instances from low-dimensional data?
A. Nearest neighbour algorithm
B. Decision tree algorithm
C. Random forest algorithm
D. All mentioned would cope
E. None of the above are suitable
A. Nearest neighbour algorithm
B. Decision tree algorithm
C. Random forest algorithm
D. All mentioned would cope
E. None of the above are suitable
Q3:
Which of the following algorithm(s) you would prefer If you
would have to classify instances from low-dimensional data?
Support Vector
Machine
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 202 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Let us go primitive, and focus only on two
pixels
Feature vectors Labels
0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3
0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6
0 0 0 0 0 0 0 0 59 163 254 202 254 194 112 18 0 0 0 0 … 3
0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6
Instances
Let us go primitive, and focus only on two
pixels
It does not really matter, which ones. I will take these two
because we got use to them already :)
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Let us go primitive, and focus only on two
pixels
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Now, let’s visualise them on a 2-D plotPixel#215
Pixel #213
254
2540
0
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Now, let’s visualise them on a 2-D plotPixel#215
Pixel #213
254
2540
0
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Now, let’s visualise them on a 2-D plotPixel#215
Pixel #213
254
2540
0
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Now, let’s visualise them on a 2-D plotPixel#215
Pixel #213
254
2540
0
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Support Vector Machine (SVM)Pixel#215
Pixel #213
254
2540
0
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
A
B C
Is it A, B or C?
Support Vector Machine (SVM)
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Margin
Support Vector Machine (SVM)
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Margin
Support Vector Machine (SVM)
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
Closest points that define hyper-
plane are called support vectors
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
3. Larger the distance
from the hyper-plane to
the instance, more
confident the classifier
about its prediction
more
confidence
Support Vector Machine (SVM)
Closest points that define hyper-
plane are called support vectors
Features Labels
254 254 3
254 193 6
254 0 6
163 202 3
227 84 6
Instances
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
3. Larger the distance
from the hyper-plane to
the instance, more
confident the classifier
about its prediction
Closest points that define hyper-
plane are called support vectors
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
3. Larger the distance
from the hyper-plane to
the instance, more
confident the classifier
about its prediction
Closest points that define hyper-
plane are called support vectors
Instance Label
?
For each new instance
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
3. Larger the distance
from the hyper-plane to
the instance, more
confident the classifier
about its prediction
Closest points that define hyper-
plane are called support vectors
Instance Label
?
For each new instance
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
3. Larger the distance
from the hyper-plane to
the instance, more
confident the classifier
about its prediction
Closest points that define hyper-
plane are called support vectors
Instance Label
6
For each new instance
Pixel#215
Pixel #213
254
2540
0
1.Identify the right hyper-
plane
2. Maximise the distance
between nearest points
and a hyper-plane
Support Vector Machine (SVM)
3. Larger the distance
from the hyper-plane to
the instance, more
confident the classifier
about its prediction
Closest points that define hyper-
plane are called support vectors
Instance Label
6
For each new instance
Pixel#215
Pixel #213
254
2540
0
Support Vector Machine (SVM)
What should we do now?
y
x
254
2540
0
Support Vector Machine (SVM)
Let’s make another dimension
z a*x2 b*y2+=
y
x
254
2540
0
Support Vector Machine (SVM)
Let’s make another dimension
z a*x2 b*y2+=
z
x
2540
0
y
x
254
2540
0
Support Vector Machine (SVM)
Let’s make another dimension
z a*x2 b*y2+=
z
x
2540
0
y
x
254
2540
0
Support Vector Machine (SVM)
Let’s make another dimension
z a*x2 b*y2+=
z
x
2540
0
y
x
254
2540
0
Support Vector Machine (SVM)
Let’s make another dimension
z a*x2 b*y2+=
z
x
2540
0
This transformation is called a kernel trick
and function z is the kernel
y
x
254
2540
0
Support Vector Machine (SVM)
Let’s make another dimension
z a*x2 b*y2+=
z
x
2540
0
This transformation is called a kernel trick
and function z is the kernel
Wow, wow, wow, hold on!
How does this actually work?
For each test example
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Fast training time O(C)
Very easy to implement
Very slow classification time
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
Comparison with SVM
Disadvantages of SVM
Very slow classification time
Suffers from
the curse of dimensionality
For each test example
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Fast training time O(C)
Very easy to implement
Very slow classification time
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
Comparison with SVM
Disadvantages of SVM
Very slow classification time
Suffers from
the curse of dimensionality
For each test example
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Fast training time O(C)
Very easy to implement
Very slow classification time
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
Comparison with SVM
Disadvantages of SVM
Very slow classification time
Suffers from
the curse of dimensionality
For each test example
Instance Label
3
1.Compute pixel-wise
distance to all training
examples
2. Find the closest
training example
3. Report it’s label
Advantages of NN
Disadvantages of NN
Fast training time O(C)
Very easy to implement
Very slow classification time
Suffers from
the curse of dimensionality
Could be a good choice for
low-dimensional problems
Comparison with SVM
Disadvantages of SVM
Very slow classification time
Suffers from
the curse of dimensionality
It might be tricky to choose
the right kernel
Quiz time
Q:
How would you approach a multi-classification task using
SVM?
Q:
How would you approach a multi-classification task using
SVM?
Pixel#215
Pixel #213
254
2540
0
Support Vector Machine (SVM)
Support Vector Machine (SVM)
Support Vector Machine (SVM)
Support Vector Machine (SVM)
100% accurate!
100% accurate!
accuracy =
correctly classified instances
total number of instances
100% accurate!
Can we trust this model?
100% accurate!
Can we trust this model?
Consider the following example:
100% accurate!
Can we trust this model?
Consider the following example:
Whatever happens,
predict 0
100% accurate!
Can we trust this model?
Consider the following example:
Whatever happens,
predict 0
Accuracy = 49/50
100% accurate!
Can we trust this model?
Consider the following example:
Whatever happens,
predict 0
Accuracy = 98%
100% accurate!
Can we trust this model?
Consider the following example:
Count
Histogram could help you figure
out if your dataset is unbalanced
100% accurate!
Can we trust this model?
Consider the following example:
What if my data is unbalanced?
Count
Histogram could help you figure
out if your dataset is unbalanced
100% accurate!
Can we trust this model?
Consider the following example:
There are few ways, we are
going to discuss them later Count
What if my data is unbalanced?
Histogram could help you figure
out if your dataset is unbalanced
100% accurate!
Can we trust this model?
In our case data is balanced:
100% accurate!
100% accurate!
Can we trust this model?
We have balanced data:
100% accurate!
Can we trust this model?
We have balanced data:
100% accurate!
Can we trust this model?
We have balanced data:
100% accurate!
Can we trust this model?
We have balanced data:
😒
So, what happened?
100% accurate!
Training the model
Feature#2
Feature #1
Let’s add more examples
Training the model
Feature#2
Feature #1
Training the model
Still linearly separable
Feature#2
Feature #1
Still linearly separable
Training the model
Feature#2
Feature #1
Training the model
Feature#2
Feature #1
Training the model
Feature#2
Feature #1
How about now?
Feature#2
Feature #1
Training the model
Feature#2
Feature #1
Simple; not perfect fit Complicated; ideal fit
Which model we should use?
Training the model
Feature#2
Feature #1
Feature#2
Feature #1
Simple; not perfect fit Complicated; ideal fit
Training the model
Which model we should use?
Feature#2
Feature #1
Feature#2
Feature #1
Simple; not perfect fit Complicated; ideal fit
Training the model
Which model we should use?
Feature#2
Feature #1
Feature#2
Feature #1
Simple; not perfect fit Complicated; ideal fit
Training the model
Which model we should use?
Feature#2
Feature #1
Feature#2
Feature #1
So, what happened?
Overfitting
100% accurate!
So, what happened?
Too general
model
Just right! Overfitting
100% accurate!
So, what happened?
Too general
model
Just right! Overfitting
We should split our data into train and test sets
100% accurate!
Split into train and test
Split into train and test
Normally we would
split data into 80%
train and 20% test
sets
Split into train and test
Normally we would
split data into 80%
train and 20% test
sets
As we have a lot of
data we can afford
50/50 ratio
Split into train and test
Can we do better than 90%?
Normally we would
split data into 80%
train and 20% test
sets
As we have a lot of
data we can afford
50/50 ratio
Parameter tuning
Model
hyper-parameter
Pixel#215
Pixel #213
254
2540
0
Model
hyper-parameter
Pixel#215
Pixel #213
254
2540
0
C = 1
Pixel#215
Pixel #213
254
2540
0
In red are ares where
penalty is applied to
instances close to the line
C = 1
Pixel#215
Pixel #213
254
2540
0
In red are ares where
penalty is applied to
instances close to the line
In green are areas where
no penalty is applied
C = 1
Pixel#215
Pixel #213
254
2540
0
In red are ares where
penalty is applied to
instances close to the line
In green are areas where
no penalty is applied
Total amount of penalty applied to the classifier is called loss
Classifiers try to minimise loss by adjusting their parameters
C = 1
Pixel#215
Pixel #213
254
2540
0
In red are ares where
penalty is applied to
instances close to the line
In green are areas where
no penalty is applied
Total amount of penalty applied to the classifier is called loss
Classifiers try to minimise loss by adjusting their parameters
C = 1
This instance increases
the penalty
Pixel#215
Pixel #213
254
2540
0
Total amount of penalty applied to the classifier is called loss
Classifiers try to minimise loss by adjusting their parameters
Now it is in a green area
In red are ares where
penalty is applied to
instances close to the line
In green are areas where
no penalty is applied
C = 1
Parameter tuning
Algorithm Hyper-parameters
K-nearest
neighbour
K - number of neighbours, (1,…,100)
Decision Tree Metric (‘gini’, ‘information gain’)
Random Forest
Number of trees (3,…,100, more better),
metric (‘gini’, information gain’)
SVM C (10-5,…,102) and gamma (10-15,…,102)
Let’s try different C maybe
our score will improve
Let’s try different C maybe
our score will improve
Nope…
Let’s try different C maybe
our score will improve
Fail again…
Let’s try different C maybe
our score will improve
It is getting depressive…
Let’s try different C maybe
our score will improve
Hurrah!
Let’s try different C maybe
our score will improve
Hurrah!
You may not have noticed but…
Let’s try different C maybe
our score will improve
Hurrah!
You may not have noticed but…
We are overfitting again…
The whole
dataset 100%
Training 60%
The whole
dataset 100%
Training 60%
For fitting initial model
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
The whole
dataset 100%
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
5/7
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
5/7
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
5/7
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
5/7
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
7/7
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
7/7
Testing 20%
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
7/7
Testing 20%
For one shot evaluation
of trained model
5/5
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
7/7
Testing 20%
For one shot evaluation
of trained model
5/5
But what happens when you overfit
validation set?
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
Testing 20%
For one shot evaluation
of trained model
5/5You’re doing great!
🙂
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
Testing 20%
For one shot evaluation
of trained model
5/5You’re doing great!
🙂
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
Testing 20%
For one shot evaluation
of trained model
4/5You’re doing great!
🙂
The whole
dataset 100%
Training 60%
For fitting initial model
Validation 20%
For parameter tuning &
performance evaluation
Testing 20%
For one shot evaluation
of trained model
4/5You’re doing great!
🙂 😒
The whole dataset 100%
Cross Validation (CV) Algorithm
Training data 80%
Cross Validation (CV) Algorithm
Test 20%
Training data 80%
Cross Validation (CV) Algorithm
20%20%20% 20%
Training data 80%
Cross Validation (CV) Algorithm
Training data 80%
Cross Validation (CV) Algorithm
20%20%20% 20%
Train on 60% of data Validate on
20%
20%20%20% 20%
20%20%20% 20%
Training data 80%
Cross Validation (CV) Algorithm
TrainTrainTrain Val
Train on 60% of data Validate on
20%
Cross Validation (CV) Algorithm
0.75
20%20%20% 20%
Training data 80%
TrainTrainTrain Val
Train on 60% of data Validate on
20%
Cross Validation (CV) Algorithm
0.75
ValTrainTrain Train 0.85
20%20%20% 20%
Training data 80%
TrainTrainTrain Val
20%20%20% 20%
Training data 80%
Cross Validation (CV) Algorithm
0.75
ValTrainTrain Train 0.85
TrainTrainTrain Val
TrainValTrain Train 0.91
Cross Validation (CV) Algorithm
0.75
0.85
TrainTrainVal Train
0.91
0.68
20%20%20% 20%
Training data 80%
ValTrainTrain Train
TrainTrainTrain Val
TrainValTrain Train
TrainTrainVal Train
20%20%20% 20%
Training data 80%
ValTrainTrain Train
TrainTrainTrain Val
TrainValTrain Train
Cross Validation (CV) Algorithm
0.75
0.85
0.91
0.68
MEAN (0.75, 0.85, 0.91, 0.68) = ?
TrainTrainVal Train
20%20%20% 20%
Training data 80%
ValTrainTrain Train
TrainTrainTrain Val
TrainValTrain Train
Cross Validation (CV) Algorithm
0.75
0.85
0.91
0.68
MEAN (0.75, 0.85, 0.91, 0.68) = 0.75
TrainTrainVal Train
20%20%20% 20%
Training data 80%
ValTrainTrain Train
TrainTrainTrain Val
TrainValTrain Train
Cross Validation (CV) Algorithm
0.75
0.85
0.91
0.68
MEAN (0.75, 0.85, 0.91, 0.68) = 0.75
Choose the best model/paramters based on
this estimate and then apply it to test set
Machine Learning pipeline
Raw Data
Machine Learning pipeline
Raw Data Preprocessing
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Find best
parameters
using CV
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Find best
parameters
using CV
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Find best
parameters
using CV
Train the model
on the whole
training set
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Find best
parameters
using CV
Train the model
on the whole
training set
Evaluate final
model on
the test set
test set
Machine Learning pipeline
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Find best
parameters
using CV
Train the model
on the whole
training set
Evaluate final
model on
the test set
test set
Machine Learning pipeline
Report your results
Raw Data Preprocessing
Feature
extraction
Split into
train & test
test set
Choose a model
Find best
parameters
using CV
Train the model
on the whole
training set
Evaluate final
model on
the test set
test set
Machine Learning pipeline
Report your results
Problem
A machine learning algorithm usually
corresponds to a combination of the
following 3 elements
The choice of a specific mapping function family F (K-NN,
SVM, DT, RF, Neural Networks etc.).
A machine learning algorithm usually
corresponds to a combination of the
following 3 elements
Way to evaluate the quality of a function f out of F. Ways of
saying how bad/good this function f is doing in classifying
real world objects.
The choice of a specific mapping function family F (K-NN,
SVM, DT, RF, Neural Networks etc.).
A machine learning algorithm usually
corresponds to a combination of the
following 3 elements
a way to search for a better function f out of F. How to
choose parameters so that the performance of f would
improve.
Way to evaluate the quality of a function f out of F. Ways of
saying how bad/good this function f is doing in classifying
real world objects.
The choice of a specific mapping function family F (K-NN,
SVM, DT, RF, Neural Networks etc.).
https://github.com/sugyan/tensorflow-mnist
References
• Machine Learning by Andrew Ng (https://www.coursera.org/learn/machine-
learning)
• Introduction to Machine Learning by Pascal Vincent given at Deep Learning
Summer School, Montreal 2015 (http://videolectures.net/
deeplearning2015_vincent_machine_learning/)
• Welcome to Machine Learning by Konstantin Tretyakov delivered at AACIMP
Summer School 2015 (http://kt.era.ee/lectures/aacimp2015/1-intro.pdf)
• Stanford CS class: Convolutional Neural Networks for Visual Recognition by
Andrej Karpathy (http://cs231n.github.io/)
• Data Mining Course by Jaak Vilo at University of Tartu (https://courses.cs.ut.ee/
MTAT.03.183/2017_spring/uploads/Main/DM_05_Clustering.pdf)
• Machine Learning Essential Conepts by Ilya Kuzovkin (https://
www.slideshare.net/iljakuzovkin)
• From the brain to deep learning and back by Raul Vicente Zafra and Ilya
Kuzovkin (http://www.uttv.ee/naita?id=23585&keel=eng)
www.biit.cs.ut.ee www.ut.ee www.quretec.ee
You, guys, rock!

Mais conteúdo relacionado

Mais procurados

An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
Tensorflow presentation
Tensorflow presentationTensorflow presentation
Tensorflow presentationAhmed rebai
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...Simplilearn
 
Introduction to machine learningunsupervised learning
Introduction to machine learningunsupervised learningIntroduction to machine learningunsupervised learning
Introduction to machine learningunsupervised learningSardar Alam
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Databricks
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)Sumit Raj
 
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...Edureka!
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Simplilearn
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 

Mais procurados (20)

An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Tensorflow presentation
Tensorflow presentationTensorflow presentation
Tensorflow presentation
 
NLP
NLPNLP
NLP
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
 
Introduction to machine learningunsupervised learning
Introduction to machine learningunsupervised learningIntroduction to machine learningunsupervised learning
Introduction to machine learningunsupervised learning
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Text Classification
Text ClassificationText Classification
Text Classification
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 

Semelhante a 1 Supervised learning

Fantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl WeirFantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl WeirFuturice
 
Writing Paper For 1St Grade - Printa
Writing Paper For 1St Grade - PrintaWriting Paper For 1St Grade - Printa
Writing Paper For 1St Grade - PrintaLisa Thompson
 
Capstone Project.pptx
Capstone Project.pptxCapstone Project.pptx
Capstone Project.pptxARESProject1
 
Example Of A Thesis Statement In An Expository Essay
Example Of A Thesis Statement In An Expository EssayExample Of A Thesis Statement In An Expository Essay
Example Of A Thesis Statement In An Expository EssayJill Swenson
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxGreg Makowski
 
Lightning Talks: An Innovation Showcase
Lightning Talks: An Innovation ShowcaseLightning Talks: An Innovation Showcase
Lightning Talks: An Innovation ShowcaseSomo
 
Introduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxIntroduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxvrickens
 
Introduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxIntroduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxbagotjesusa
 
Highschool Research Paper Samples Research Paper
Highschool Research Paper Samples Research PaperHighschool Research Paper Samples Research Paper
Highschool Research Paper Samples Research PaperApril Charlton
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.Theo Schlossnagle
 
Ib Extended Essay Politics. Online assignment writing service.
Ib Extended Essay Politics. Online assignment writing service.Ib Extended Essay Politics. Online assignment writing service.
Ib Extended Essay Politics. Online assignment writing service.Lucy Jensen
 
Recasting the Role of Big (or Little) Data
Recasting the Role of Big (or Little) DataRecasting the Role of Big (or Little) Data
Recasting the Role of Big (or Little) DataMerck
 
Virtual/Augmented reality, digital tools and superpowers for health applicati...
Virtual/Augmented reality, digital tools and superpowers for health applicati...Virtual/Augmented reality, digital tools and superpowers for health applicati...
Virtual/Augmented reality, digital tools and superpowers for health applicati...Boo Aguilar
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Mathieu DESPRIEE
 
Case Study Essay Examples.pdf
Case Study Essay Examples.pdfCase Study Essay Examples.pdf
Case Study Essay Examples.pdfTasha Hernandez
 
From Billions to Quintillions: Paving the way to real-time motif discovery in...
From Billions to Quintillions: Paving the way to real-time motif discovery in...From Billions to Quintillions: Paving the way to real-time motif discovery in...
From Billions to Quintillions: Paving the way to real-time motif discovery in...J On The Beach
 
Permisologia plantilla.pptx
Permisologia plantilla.pptxPermisologia plantilla.pptx
Permisologia plantilla.pptxCLINICASERUM
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfHassanElalfy4
 
Gp Essays On Poverty
Gp Essays On PovertyGp Essays On Poverty
Gp Essays On PovertyAlexis Turner
 

Semelhante a 1 Supervised learning (20)

Fantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl WeirFantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl Weir
 
Writing Paper For 1St Grade - Printa
Writing Paper For 1St Grade - PrintaWriting Paper For 1St Grade - Printa
Writing Paper For 1St Grade - Printa
 
Capstone Project.pptx
Capstone Project.pptxCapstone Project.pptx
Capstone Project.pptx
 
Example Of A Thesis Statement In An Expository Essay
Example Of A Thesis Statement In An Expository EssayExample Of A Thesis Statement In An Expository Essay
Example Of A Thesis Statement In An Expository Essay
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
 
Lightning Talks: An Innovation Showcase
Lightning Talks: An Innovation ShowcaseLightning Talks: An Innovation Showcase
Lightning Talks: An Innovation Showcase
 
Introduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxIntroduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docx
 
Introduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxIntroduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docx
 
Highschool Research Paper Samples Research Paper
Highschool Research Paper Samples Research PaperHighschool Research Paper Samples Research Paper
Highschool Research Paper Samples Research Paper
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Ib Extended Essay Politics. Online assignment writing service.
Ib Extended Essay Politics. Online assignment writing service.Ib Extended Essay Politics. Online assignment writing service.
Ib Extended Essay Politics. Online assignment writing service.
 
Recasting the Role of Big (or Little) Data
Recasting the Role of Big (or Little) DataRecasting the Role of Big (or Little) Data
Recasting the Role of Big (or Little) Data
 
Virtual/Augmented reality, digital tools and superpowers for health applicati...
Virtual/Augmented reality, digital tools and superpowers for health applicati...Virtual/Augmented reality, digital tools and superpowers for health applicati...
Virtual/Augmented reality, digital tools and superpowers for health applicati...
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
Case Study Essay Examples.pdf
Case Study Essay Examples.pdfCase Study Essay Examples.pdf
Case Study Essay Examples.pdf
 
From Billions to Quintillions: Paving the way to real-time motif discovery in...
From Billions to Quintillions: Paving the way to real-time motif discovery in...From Billions to Quintillions: Paving the way to real-time motif discovery in...
From Billions to Quintillions: Paving the way to real-time motif discovery in...
 
Permisologia plantilla.pptx
Permisologia plantilla.pptxPermisologia plantilla.pptx
Permisologia plantilla.pptx
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdf
 
Gp Essays On Poverty
Gp Essays On PovertyGp Essays On Poverty
Gp Essays On Poverty
 

Mais de Dmytro Fishman

DOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDmytro Fishman
 
Tips for effective presentations
Tips for effective presentationsTips for effective presentations
Tips for effective presentationsDmytro Fishman
 
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPAutonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPDmytro Fishman
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningDmytro Fishman
 
Introduction to Machine Learning for Taxify/Bolt
Introduction to Machine Learning for Taxify/BoltIntroduction to Machine Learning for Taxify/Bolt
Introduction to Machine Learning for Taxify/BoltDmytro Fishman
 
Introduction to Gaussian Processes
Introduction to Gaussian ProcessesIntroduction to Gaussian Processes
Introduction to Gaussian ProcessesDmytro Fishman
 
Detecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDetecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDmytro Fishman
 
Deep Learning in Healthcare
Deep Learning in HealthcareDeep Learning in Healthcare
Deep Learning in HealthcareDmytro Fishman
 
5 Introduction to neural networks
5 Introduction to neural networks5 Introduction to neural networks
5 Introduction to neural networksDmytro Fishman
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)Dmytro Fishman
 
3 Unsupervised learning
3 Unsupervised learning3 Unsupervised learning
3 Unsupervised learningDmytro Fishman
 
What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?Dmytro Fishman
 
Machine Learning in Bioinformatics
Machine Learning in BioinformaticsMachine Learning in Bioinformatics
Machine Learning in BioinformaticsDmytro Fishman
 

Mais de Dmytro Fishman (14)

DOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biologyDOME: Recommendations for supervised machine learning validation in biology
DOME: Recommendations for supervised machine learning validation in biology
 
Tips for effective presentations
Tips for effective presentationsTips for effective presentations
Tips for effective presentations
 
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WPAutonomous Driving Lab - Simultaneous Localization and Mapping WP
Autonomous Driving Lab - Simultaneous Localization and Mapping WP
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Introduction to Machine Learning for Taxify/Bolt
Introduction to Machine Learning for Taxify/BoltIntroduction to Machine Learning for Taxify/Bolt
Introduction to Machine Learning for Taxify/Bolt
 
Introduction to Gaussian Processes
Introduction to Gaussian ProcessesIntroduction to Gaussian Processes
Introduction to Gaussian Processes
 
Biit group 2018
Biit group 2018Biit group 2018
Biit group 2018
 
Detecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep LearningDetecting Nuclei from Microscopy Images with Deep Learning
Detecting Nuclei from Microscopy Images with Deep Learning
 
Deep Learning in Healthcare
Deep Learning in HealthcareDeep Learning in Healthcare
Deep Learning in Healthcare
 
5 Introduction to neural networks
5 Introduction to neural networks5 Introduction to neural networks
5 Introduction to neural networks
 
4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)4 Dimensionality reduction (PCA & t-SNE)
4 Dimensionality reduction (PCA & t-SNE)
 
3 Unsupervised learning
3 Unsupervised learning3 Unsupervised learning
3 Unsupervised learning
 
What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?What does it mean to be a bioinformatician?
What does it mean to be a bioinformatician?
 
Machine Learning in Bioinformatics
Machine Learning in BioinformaticsMachine Learning in Bioinformatics
Machine Learning in Bioinformatics
 

Último

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 

Último (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 

1 Supervised learning

  • 1. Introduction to Machine Learning (Supervised learning) Dmytro Fishman (dmytro@ut.ee)
  • 2.
  • 3. This is an introduction to the topic
  • 4. This is an introduction to the topic We will try to provide a beautiful scenery
  • 5. “We love you, Mummy!”
  • 6. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 “We love you, Mummy!”
  • 7. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 “We love you, Mummy!” Petal Sepal
  • 8. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 “We love you, Mummy!” Petal Sepal Word 1 25 Word 2 23 Word 3 12 … …
  • 9. Petal Sepal 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 Word 1 25 Word 2 23 Word 3 12 … … “We love you, Mummy!”
  • 10. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 Big DataBig Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
  • 11. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 Big Data Astronomical? Youtubical? Big Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 Genomical?
  • 12. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 Big Data Astronomical? Genomical? Youtubical? 1 Exabyte/year 2-40 Exabyte/year 1-2 Exabyte/year Big Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195
  • 13. http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 Big Data Astronomical? Genomical? Youtubical? 1 Exabyte/year 2-40 Exabyte/year 1-2 Exabyte/year Big Data: Astronomical or Genomical? http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 1 Exabyte =1012 Mb
  • 14. There is a lot of data produced nowadays But there are also a vast number of potential ways to use this data
  • 17. Malignant? Tumour size Benign Malignant Skin cancer example Yes(1) No(0)
  • 18. Malignant? Tumour size Benign Malignant Skin cancer example Yes(1) No(0)
  • 19. Malignant? Tumour size Benign Malignant Skin cancer example Yes(1) No(0)
  • 25. Tumour size Age Malignant? Other features: Lesion type Lesion configuration Texture Location Distribution … Potentially infinitely many features!
  • 26. Classification task Predicting discrete value output using previously labeled examples Binary classification
  • 27. Classification task Predicting discrete value output using previously labeled examples also binary classification
  • 28. Classification task Predicting discrete value output using previously labeled examples also binary classification Every time you have to distinguish between TWO CLASSES it is a binary classification
  • 29. Classification task Multiclass classification Predicting discrete value output using previously labeled examples
  • 31. Housing price prediction Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500
  • 32. Housing price prediction Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500
  • 33. Housing price prediction Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500 Price?
  • 34. Housing price prediction Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500 Price?
  • 35. Housing price prediction Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500 Price
  • 36. Housing price prediction Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500 Price
  • 37. Regression task Size in m2 Price in 1000’s ($) 400 100 200 300 100 200 300 400 500 Price
  • 38. Malignant? Tumour size Yes(1) No(0) Benign Malignant Malignant? VS Classification Regression Supervised Learning Size in m2 Pricein1000’s($) 400 100 200 300 100 200 300 400 500 Price?
  • 39. You are running a company which has two problems, namely:
 Q: a. Both problems are examples of classification problems b. The first one is a classification task and the second one regression problem c. The first one is regression problem and the second one as classification task d. Both problems are regression problems 1. For each user in the database predict if this user will continue using your company’s product or will move to competitors (churn). 2. Predict the profit of your company at the end of this year based on previous records. How would you approach these problems?
  • 40. You are running a company which has two problems, namely:
 Q: a. Both problems are examples of classification problems b. The first one is a classification task and the second one regression problem c. The first one is regression problem and the second one as classification task d. Both problems are regression problems 1. For each user in the database predict if this user will continue using your company’s product or will move to competitors (churn). 2. Predict the profit of your company at the end of this year based on previous records. How would you approach these problems?
  • 41. Unsupervised Learning examples slides Clustering with google queries On a contrary to the first category we don’t have labels to our classes (graphs with two features from previous examples turns into unlabelled one) Gene expression clustering Quiz question: “of the following examples, which would you address using a n unsupervised learning algorithm?”
  • 44. Tumour size Age Unsupervised Learning Is there any interesting hidden structure in this data?
  • 45. Tumour size Age Unsupervised Learning Is there any interesting hidden structure in this data? What does this hidden structure correspond to?
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56. Q1: Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of ... Q2: You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of ... Q3: Assume you want to perform supervised learning and to predict number of newborns according to size of storks' population (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of ... Q4: Discriminating between spam and ham e-mails is a classification task, true or false?
  • 57. Q1: Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of clustering Q2: You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of ... Quiz: Assume you want to perform supervised learning and to predict number of newborns according to size of storks' population (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of ... Quiz: Discriminating between spam and ham e-mails is a classification task, true or false?
  • 58. Q1: Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of clustering Q2: You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of regression Q3: Assume you want to perform supervised learning and to predict number of newborns according to size of storks' population (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of ... Quiz: Discriminating between spam and ham e-mails is a classification task, true or false?
  • 59. Q3: Assume you want to perform supervised learning and to predict number of newborns according to size of storks' population (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of stupidity regression Q4: Discriminating between spam and ham e-mails is a classification task, true or false? Q2: You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of regression Q1: Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of clustering
  • 60. Q4: Discriminating between spam and ham e-mails is a classification task. Q3: Assume you want to perform supervised learning and to predict number of newborns according to size of storks' population (http://www.brixtonhealth.com/storksBabies.pdf), it is an example of stupidity regression Q2: You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of regression Q1: Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of clustering
  • 62. MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px 3
  • 63. MNIST dataset (10000 images) In total 784 pixel values Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) 3
  • 64. Pixel values Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values
  • 65. Pixel values Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values Feature
  • 66. Pixel values Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values Feature
  • 67. Pixel values Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values Feature
  • 68. Pixel values Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel valuesFeatures are also some times referred to as dimensions
  • 69. Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel valuesFeatures are also some times referred to as dimensions This images are 784 dimensional
  • 70. Pixel values Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values
  • 71. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values Data is loaded. What should we do now?
  • 72. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values Data is loaded. What should we do now? We would like to build a tool that would be able to automatically recognise handwritten images
  • 73. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 8 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 1 Instances MNIST dataset (10000 images) Instance Label0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 28px 28px (0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 …) could be downloaded from: http://yann.lecun.com/exdb/mnist/ 3 In total 784 pixel values Data is loaded. What should we do now? We would like to build a tool that would be able to automatically recognise handwritten images Let’s get to the first algorithm
  • 74. How to quantitatively say which of these pairs are more similar? & & A B CA OR
  • 75. How to quantitatively say which of these pairs are more similar? & & A B CA What about computing their pixel-wise difference? OR
  • 76. How to quantitatively say which of these pairs are more similar? & & A B CA OR Σ 784 |Ai - Ci|Σ 784 |Ai - Bi|i i
  • 77. How to quantitatively say which of these pairs are more similar? & & A B CA OR Σ 784 |Ai - Ci|Σ 784 |Ai - Bi|i i
  • 78. How to quantitatively say which of these pairs are more similar? & & A B CA OR Σ 784 |Ai - Ci|Σ 784 |Ai - Bi|i i
  • 79. How to quantitatively say which of these pairs are more similar? & & A B CA OR Σ 784 |Ai - Ci| = 107.38Σ 784 |Ai - Bi| = 137.03i i
  • 80. How to quantitatively say which of these pairs are more similar? & & A B CA OR Σ 784 |Ai - Ci| = 107.38Σ 784 |Ai - Bi| = 137.03i i A is more similar to C than B
  • 81. How to quantitatively say which of these pairs are more similar? & & A B CA OR Σ 784 |Ai - Ci| = 107.38Σ 784 |Ai - Bi| = 137.03i i A is more similar (closer) to C than B
  • 82. Σ 784 |Ai - Ci| = 107.38Σ 784 |Ai - Bi| = 137.03 How to quantitatively say which of these pairs are more similar? & & A B CA i i OR A is more similar (closer) to C than B
  • 83. Σ 784 |Ai - Ci| = 107.38Σ 784 |Ai - Bi| = 137.03 How to quantitatively say which of these pairs are more similar? & & A B CA i i OR A is more similar (closer) to C than B
  • 84. Instance Label ? DatasetFor each new instance We asked our friend to write a bunch of new digits so that we can have something to recognise, here is the first one of them
  • 86. Instance Label ? 1.Compute pixel-wise distance to all training examples For each new instance Dataset
  • 87. Instance Label ? 1.Compute pixel-wise distance to all training examples For each new instance Dataset
  • 88. Instance Label ? 1.Compute pixel-wise distance to all training examples For each new instance Dataset
  • 89. Instance Label ? 1.Compute pixel-wise distance to all training examples For each new instance Dataset
  • 90. Instance Label ? 1.Compute pixel-wise distance to all training examples 2. Find the closest training example For each new instance Dataset
  • 91. Instance Label ? 1.Compute pixel-wise distance to all training examples 2. Find the closest training example For each new instance Dataset
  • 92. Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Nearest Neighbour classifier For each new instance Dataset
  • 93. Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN For each new instance
  • 94. Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Very easy to implement For each new instance
  • 95. Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Very easy to implement Very slow classification time For each new instance
  • 96. Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Very easy to implement Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems For each new instance Very slow classification time
  • 97. Curse of dimensionality Remember we said that our instances are 784 dimensional?
  • 98. Curse of dimensionality Remember we said that our instances are 784 dimensional? This is a lot!
  • 99.
  • 102. Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Very easy to implement Very slow classification time Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems For each new instance
  • 103. For each test example Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Fast training time O(C) Very easy to implement Very slow classification time Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems NN is rarely used in practice
  • 104. For each test example Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Fast training time O(C) Very easy to implement Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems Can we find a better algorithm? Very slow classification time
  • 105. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification *for a sec
  • 106. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification *for a sec pixel #213 pixel #213
  • 107. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification *for a sec pixel #213 > 163 <= 163 pixel #213
  • 108. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification *for a sec pixel #213 > 163 <= 163 pixel #216 pixel #216 > 30 <= 30
  • 109. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances pixel #216 VS Back to binary classification *for a sec pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree
  • 110. Instances pixel #216 Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 pixel #216 > 30 <= 30 VS Back to binary classification *for a sec pixel #213 > 163 <= 163 Split
  • 111. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification pixel #213 > 163 <= 163 pixel #216 pixel #216 > 30 <= 30
  • 112. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification *for a sec pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 How do you know which features to use for best splits? Split
  • 113. VS Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Back to binary classification *for a sec pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 How do you know which features to use for best splits? Split Using various goodness metrics such as information gain or gini impurity to define “best”
  • 114. Decision (classification) tree algorithm 1.Construct a decision tree based on training examples
  • 115. Decision (classification) tree algorithm 1.Construct a decision tree based on training examples pixel #213 > 163 <= 163 pixel #216 > 30 <= 30
  • 116. Decision (classification) tree algorithm pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Instance Label ? 2.Make corresponding comparisons 1.Construct a decision tree based on training examples #213 For each new instance
  • 117. Decision (classification) tree algorithm pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Instance Label ? 2.Make corresponding comparisons 1.Construct a decision tree based on training examples #213 #216 For each new instance
  • 118. Decision (classification) tree algorithm pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Instance Label 6 3. Report label 1.Construct a decision tree based on training examples 2.Make corresponding comparisons #213 #216 For each new instance
  • 119. Decision (classification) tree algorithm pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Instance Label 6 Depth=2 Once the tree is constructed maximum 2 comparisons would be needed to test a new example3. Report label 1.Construct a decision tree based on training examples 2.Make corresponding comparisons #213 #216 For each new instance
  • 120. Decision (classification) tree algorithm pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Instance Label 6 Depth=2 In general decision trees are *always faster than NN algorithm3. Report label 1.Construct a decision tree based on training examples 2.Make corresponding comparisons *remember, shit happens #213 #216 For each new instance
  • 121. Can we find a better algorithm? Disadvantages of NN Very slow classification time Suffers from the curse of dimensionality
  • 122. Can we find a better algorithm? Disadvantages of NNDisadvantages of DT Very slow classification time Very slow classification time Suffers from the curse of dimensionality
  • 123. Can we find a better algorithm? Disadvantages of NNDisadvantages of DT Also suffers from the curse of dimensionality Very slow classification time Very slow classification time Suffers from the curse of dimensionality
  • 124. Is there a way to break the curse?
  • 125. Is there a way to break the curse?
  • 126. pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree algorithm is non-parametric and deterministic
  • 127. pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree algorithm is non-parametric and deterministic The shape of the tree is determined by data not our choice
  • 128. pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree algorithm is non-parametric and deterministic This means that we will always have the same output given the same input… The shape of the tree is determined by data not our choice
  • 129. The shape of the tree is determined by data not our choice pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree algorithm is non-parametric and deterministic This means that we will always have the same output given the same input… Are all input dimensions equally important for classification?
  • 130. The shape of the tree is determined by data not our choice pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree algorithm is non-parametric and deterministic This means that we will always have the same output given the same input… How about building a lot of trees from random parts of the data and then merging their predictions? Are all input dimensions equally important for classification?
  • 131. The shape of the tree is determined by data not our choice pixel #213 > 163 <= 163 pixel #216 > 30 <= 30 Decision tree algorithm is non-parametric and deterministic This means that we will always have the same output given the same input… How about building a lot of trees from random parts of the data and then merging their predictions? Are all input dimensions equally important for classification? Random forest algorithm
  • 132. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm
  • 133. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm Randomly discard some rows
  • 134. Randomly discard some rows and columns Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm
  • 135. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm Build a decision tree based on remaining data pixel #213 > 163 <= 163 pixel #216 > 0 = 0
  • 136. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 Build a decision tree based on remaining data Repeat N times until N trees are constructed
  • 137. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163
  • 138. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 254 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30
  • 139. Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30 Instance Label ? For each new instance
  • 140. Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30 Instance Label ? For each new instance Use all constructed trees to generate predictions
  • 141. Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30 Instance Label ? For each new instance Predictions Tree #2 Tree #1 Tree #3
  • 142. Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30 Instance Label For each new instance Predictions Tree #2 Tree #1 Tree #3? Average 2/3
  • 143. Random forest algorithm pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30 Instance Label For each new instance Predictions Tree #2 Tree #1 Tree #36 Average 2/3 = 66.6%
  • 144. Random forest algorithm Instance Label 6 For each new instance Predictions Tree #2 Tree #1 Tree #3 Average 2/3 = 66.6% Quiz time pixel #213 > 163 <= 163 pixel #216 > 0 = 0 pixel #213 > 163 <= 163 pixel #214 > 253 <= 253 pixel #216 > 30 <= 30
  • 145. Q1: Which classification algorithm(s) has(ve) the following weaknesses: • It takes more time to train the classifier then to classify a new instance • It suffers from the curse of dimensionality A. Nearest neighbour algorithm B. Decision tree C. Random forest algorithm D. None of the above E. All of the above
  • 146. Q1: A. Nearest neighbour algorithm B. Decision tree C. Random forest algorithm D. None of the above E. All of the above • It takes more time to train the classifier then to classify a new example • It suffers from the curse of dimensionality pixel #213 > 163 <= 163 pixel #216 > 0 = 0 Which classification algorithm(s) has(ve) the following weaknesses:
  • 147. Q2: A. Prohibitively slow running time at training given a lot of data B. Highly biased classification due prevalence of one of the classes C. High classification error due to excessively complex classifier D. Poor performance of the classifier trained on data with large number of features E. None of the above Which of the following statements best defines the curse of dimensionality
  • 148. Q2: Which of the following statements best defines the curse of dimensionality A. Prohibitively slow running time at training given a lot of data B. Highly biased classification due prevalence of one of the classes C. High classification error due to excessively complex classifier D. Poor performance of the classifier trained on data with large number of features E. None of the above
  • 149. Q3: Which of the following algorithms you would prefer If you would have to classify instances from low-dimensional data? A. Nearest neighbour algorithm B. Decision tree algorithm C. Random forest algorithm D. All mentioned would cope E. None of the above are suitable
  • 150. A. Nearest neighbour algorithm B. Decision tree algorithm C. Random forest algorithm D. All mentioned would cope E. None of the above are suitable Q3: Which of the following algorithm(s) you would prefer If you would have to classify instances from low-dimensional data?
  • 152. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 202 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Let us go primitive, and focus only on two pixels
  • 153. Feature vectors Labels 0 0 0 0 0 0 0 31 132 254 253 254 213 82 0 0 0 0 0 0 … 3 0 0 0 0 0 0 0 25 142 254 254 193 30 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 123 254 87 0 0 0 0 0 0 0 0 0 … 6 0 0 0 0 0 0 0 0 59 163 254 202 254 194 112 18 0 0 0 0 … 3 0 0 0 0 0 0 0 0 19 227 254 84 0 0 0 0 0 0 0 0 … 6 Instances Let us go primitive, and focus only on two pixels It does not really matter, which ones. I will take these two because we got use to them already :)
  • 154. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Let us go primitive, and focus only on two pixels
  • 155. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Now, let’s visualise them on a 2-D plotPixel#215 Pixel #213 254 2540 0
  • 156. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Now, let’s visualise them on a 2-D plotPixel#215 Pixel #213 254 2540 0
  • 157. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Now, let’s visualise them on a 2-D plotPixel#215 Pixel #213 254 2540 0
  • 158. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Now, let’s visualise them on a 2-D plotPixel#215 Pixel #213 254 2540 0
  • 159. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Support Vector Machine (SVM)Pixel#215 Pixel #213 254 2540 0
  • 160. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane A B C Is it A, B or C? Support Vector Machine (SVM)
  • 161. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM)
  • 162. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Margin Support Vector Machine (SVM)
  • 163. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Margin Support Vector Machine (SVM)
  • 164. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM) Closest points that define hyper- plane are called support vectors
  • 165. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane 3. Larger the distance from the hyper-plane to the instance, more confident the classifier about its prediction more confidence Support Vector Machine (SVM) Closest points that define hyper- plane are called support vectors
  • 166. Features Labels 254 254 3 254 193 6 254 0 6 163 202 3 227 84 6 Instances Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM) 3. Larger the distance from the hyper-plane to the instance, more confident the classifier about its prediction Closest points that define hyper- plane are called support vectors
  • 167. Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM) 3. Larger the distance from the hyper-plane to the instance, more confident the classifier about its prediction Closest points that define hyper- plane are called support vectors Instance Label ? For each new instance
  • 168. Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM) 3. Larger the distance from the hyper-plane to the instance, more confident the classifier about its prediction Closest points that define hyper- plane are called support vectors Instance Label ? For each new instance
  • 169. Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM) 3. Larger the distance from the hyper-plane to the instance, more confident the classifier about its prediction Closest points that define hyper- plane are called support vectors Instance Label 6 For each new instance
  • 170. Pixel#215 Pixel #213 254 2540 0 1.Identify the right hyper- plane 2. Maximise the distance between nearest points and a hyper-plane Support Vector Machine (SVM) 3. Larger the distance from the hyper-plane to the instance, more confident the classifier about its prediction Closest points that define hyper- plane are called support vectors Instance Label 6 For each new instance
  • 171. Pixel#215 Pixel #213 254 2540 0 Support Vector Machine (SVM) What should we do now?
  • 172. y x 254 2540 0 Support Vector Machine (SVM) Let’s make another dimension z a*x2 b*y2+=
  • 173. y x 254 2540 0 Support Vector Machine (SVM) Let’s make another dimension z a*x2 b*y2+= z x 2540 0
  • 174. y x 254 2540 0 Support Vector Machine (SVM) Let’s make another dimension z a*x2 b*y2+= z x 2540 0
  • 175. y x 254 2540 0 Support Vector Machine (SVM) Let’s make another dimension z a*x2 b*y2+= z x 2540 0
  • 176. y x 254 2540 0 Support Vector Machine (SVM) Let’s make another dimension z a*x2 b*y2+= z x 2540 0 This transformation is called a kernel trick and function z is the kernel
  • 177. y x 254 2540 0 Support Vector Machine (SVM) Let’s make another dimension z a*x2 b*y2+= z x 2540 0 This transformation is called a kernel trick and function z is the kernel Wow, wow, wow, hold on! How does this actually work?
  • 178. For each test example Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Fast training time O(C) Very easy to implement Very slow classification time Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems Comparison with SVM Disadvantages of SVM Very slow classification time Suffers from the curse of dimensionality
  • 179. For each test example Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Fast training time O(C) Very easy to implement Very slow classification time Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems Comparison with SVM Disadvantages of SVM Very slow classification time Suffers from the curse of dimensionality
  • 180. For each test example Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Fast training time O(C) Very easy to implement Very slow classification time Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems Comparison with SVM Disadvantages of SVM Very slow classification time Suffers from the curse of dimensionality
  • 181. For each test example Instance Label 3 1.Compute pixel-wise distance to all training examples 2. Find the closest training example 3. Report it’s label Advantages of NN Disadvantages of NN Fast training time O(C) Very easy to implement Very slow classification time Suffers from the curse of dimensionality Could be a good choice for low-dimensional problems Comparison with SVM Disadvantages of SVM Very slow classification time Suffers from the curse of dimensionality It might be tricky to choose the right kernel
  • 183. Q: How would you approach a multi-classification task using SVM?
  • 184. Q: How would you approach a multi-classification task using SVM? Pixel#215 Pixel #213 254 2540 0
  • 188. Support Vector Machine (SVM) 100% accurate!
  • 190. accuracy = correctly classified instances total number of instances 100% accurate!
  • 191. Can we trust this model? 100% accurate!
  • 192. Can we trust this model? Consider the following example: 100% accurate!
  • 193. Can we trust this model? Consider the following example: Whatever happens, predict 0 100% accurate!
  • 194. Can we trust this model? Consider the following example: Whatever happens, predict 0 Accuracy = 49/50 100% accurate!
  • 195. Can we trust this model? Consider the following example: Whatever happens, predict 0 Accuracy = 98% 100% accurate!
  • 196. Can we trust this model? Consider the following example: Count Histogram could help you figure out if your dataset is unbalanced 100% accurate!
  • 197. Can we trust this model? Consider the following example: What if my data is unbalanced? Count Histogram could help you figure out if your dataset is unbalanced 100% accurate!
  • 198. Can we trust this model? Consider the following example: There are few ways, we are going to discuss them later Count What if my data is unbalanced? Histogram could help you figure out if your dataset is unbalanced 100% accurate!
  • 199. Can we trust this model? In our case data is balanced: 100% accurate!
  • 200. 100% accurate! Can we trust this model? We have balanced data:
  • 201. 100% accurate! Can we trust this model? We have balanced data:
  • 202. 100% accurate! Can we trust this model? We have balanced data:
  • 203. 100% accurate! Can we trust this model? We have balanced data: 😒
  • 205. Training the model Feature#2 Feature #1 Let’s add more examples
  • 207. Training the model Still linearly separable Feature#2 Feature #1
  • 208. Still linearly separable Training the model Feature#2 Feature #1
  • 211. Feature#2 Feature #1 Training the model Feature#2 Feature #1
  • 212. Simple; not perfect fit Complicated; ideal fit Which model we should use? Training the model Feature#2 Feature #1 Feature#2 Feature #1
  • 213. Simple; not perfect fit Complicated; ideal fit Training the model Which model we should use? Feature#2 Feature #1 Feature#2 Feature #1
  • 214. Simple; not perfect fit Complicated; ideal fit Training the model Which model we should use? Feature#2 Feature #1 Feature#2 Feature #1
  • 215. Simple; not perfect fit Complicated; ideal fit Training the model Which model we should use? Feature#2 Feature #1 Feature#2 Feature #1
  • 217. So, what happened? Too general model Just right! Overfitting 100% accurate!
  • 218. So, what happened? Too general model Just right! Overfitting We should split our data into train and test sets 100% accurate!
  • 219. Split into train and test
  • 220. Split into train and test Normally we would split data into 80% train and 20% test sets
  • 221. Split into train and test Normally we would split data into 80% train and 20% test sets As we have a lot of data we can afford 50/50 ratio
  • 222. Split into train and test Can we do better than 90%? Normally we would split data into 80% train and 20% test sets As we have a lot of data we can afford 50/50 ratio
  • 227. Pixel#215 Pixel #213 254 2540 0 In red are ares where penalty is applied to instances close to the line C = 1
  • 228. Pixel#215 Pixel #213 254 2540 0 In red are ares where penalty is applied to instances close to the line In green are areas where no penalty is applied C = 1
  • 229. Pixel#215 Pixel #213 254 2540 0 In red are ares where penalty is applied to instances close to the line In green are areas where no penalty is applied Total amount of penalty applied to the classifier is called loss Classifiers try to minimise loss by adjusting their parameters C = 1
  • 230. Pixel#215 Pixel #213 254 2540 0 In red are ares where penalty is applied to instances close to the line In green are areas where no penalty is applied Total amount of penalty applied to the classifier is called loss Classifiers try to minimise loss by adjusting their parameters C = 1 This instance increases the penalty
  • 231. Pixel#215 Pixel #213 254 2540 0 Total amount of penalty applied to the classifier is called loss Classifiers try to minimise loss by adjusting their parameters Now it is in a green area In red are ares where penalty is applied to instances close to the line In green are areas where no penalty is applied C = 1
  • 232. Parameter tuning Algorithm Hyper-parameters K-nearest neighbour K - number of neighbours, (1,…,100) Decision Tree Metric (‘gini’, ‘information gain’) Random Forest Number of trees (3,…,100, more better), metric (‘gini’, information gain’) SVM C (10-5,…,102) and gamma (10-15,…,102)
  • 233.
  • 234. Let’s try different C maybe our score will improve
  • 235. Let’s try different C maybe our score will improve Nope…
  • 236. Let’s try different C maybe our score will improve Fail again…
  • 237. Let’s try different C maybe our score will improve It is getting depressive…
  • 238. Let’s try different C maybe our score will improve Hurrah!
  • 239. Let’s try different C maybe our score will improve Hurrah! You may not have noticed but…
  • 240. Let’s try different C maybe our score will improve Hurrah! You may not have noticed but… We are overfitting again…
  • 243. Training 60% For fitting initial model The whole dataset 100%
  • 244. Training 60% For fitting initial model Validation 20% The whole dataset 100%
  • 245. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 5/7
  • 246. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 5/7
  • 247. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 5/7
  • 248. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 5/7
  • 249. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 7/7
  • 250. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 7/7 Testing 20%
  • 251. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 7/7 Testing 20% For one shot evaluation of trained model 5/5
  • 252. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation 7/7 Testing 20% For one shot evaluation of trained model 5/5 But what happens when you overfit validation set?
  • 253. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation Testing 20% For one shot evaluation of trained model 5/5You’re doing great! 🙂
  • 254. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation Testing 20% For one shot evaluation of trained model 5/5You’re doing great! 🙂
  • 255. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation Testing 20% For one shot evaluation of trained model 4/5You’re doing great! 🙂
  • 256. The whole dataset 100% Training 60% For fitting initial model Validation 20% For parameter tuning & performance evaluation Testing 20% For one shot evaluation of trained model 4/5You’re doing great! 🙂 😒
  • 257. The whole dataset 100% Cross Validation (CV) Algorithm
  • 258. Training data 80% Cross Validation (CV) Algorithm Test 20%
  • 259. Training data 80% Cross Validation (CV) Algorithm
  • 260. 20%20%20% 20% Training data 80% Cross Validation (CV) Algorithm
  • 261. Training data 80% Cross Validation (CV) Algorithm 20%20%20% 20% Train on 60% of data Validate on 20% 20%20%20% 20%
  • 262. 20%20%20% 20% Training data 80% Cross Validation (CV) Algorithm TrainTrainTrain Val Train on 60% of data Validate on 20%
  • 263. Cross Validation (CV) Algorithm 0.75 20%20%20% 20% Training data 80% TrainTrainTrain Val Train on 60% of data Validate on 20%
  • 264. Cross Validation (CV) Algorithm 0.75 ValTrainTrain Train 0.85 20%20%20% 20% Training data 80% TrainTrainTrain Val
  • 265. 20%20%20% 20% Training data 80% Cross Validation (CV) Algorithm 0.75 ValTrainTrain Train 0.85 TrainTrainTrain Val TrainValTrain Train 0.91
  • 266. Cross Validation (CV) Algorithm 0.75 0.85 TrainTrainVal Train 0.91 0.68 20%20%20% 20% Training data 80% ValTrainTrain Train TrainTrainTrain Val TrainValTrain Train
  • 267. TrainTrainVal Train 20%20%20% 20% Training data 80% ValTrainTrain Train TrainTrainTrain Val TrainValTrain Train Cross Validation (CV) Algorithm 0.75 0.85 0.91 0.68 MEAN (0.75, 0.85, 0.91, 0.68) = ?
  • 268. TrainTrainVal Train 20%20%20% 20% Training data 80% ValTrainTrain Train TrainTrainTrain Val TrainValTrain Train Cross Validation (CV) Algorithm 0.75 0.85 0.91 0.68 MEAN (0.75, 0.85, 0.91, 0.68) = 0.75
  • 269. TrainTrainVal Train 20%20%20% 20% Training data 80% ValTrainTrain Train TrainTrainTrain Val TrainValTrain Train Cross Validation (CV) Algorithm 0.75 0.85 0.91 0.68 MEAN (0.75, 0.85, 0.91, 0.68) = 0.75 Choose the best model/paramters based on this estimate and then apply it to test set
  • 272. Raw Data Preprocessing Machine Learning pipeline
  • 274. Raw Data Preprocessing Feature extraction Split into train & test Machine Learning pipeline
  • 275. Raw Data Preprocessing Feature extraction Split into train & test test set Machine Learning pipeline
  • 276. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Machine Learning pipeline
  • 277. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Find best parameters using CV Machine Learning pipeline
  • 278. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Find best parameters using CV Machine Learning pipeline
  • 279. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Find best parameters using CV Train the model on the whole training set Machine Learning pipeline
  • 280. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Find best parameters using CV Train the model on the whole training set Evaluate final model on the test set test set Machine Learning pipeline
  • 281. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Find best parameters using CV Train the model on the whole training set Evaluate final model on the test set test set Machine Learning pipeline Report your results
  • 282. Raw Data Preprocessing Feature extraction Split into train & test test set Choose a model Find best parameters using CV Train the model on the whole training set Evaluate final model on the test set test set Machine Learning pipeline Report your results Problem
  • 283. A machine learning algorithm usually corresponds to a combination of the following 3 elements The choice of a specific mapping function family F (K-NN, SVM, DT, RF, Neural Networks etc.).
  • 284. A machine learning algorithm usually corresponds to a combination of the following 3 elements Way to evaluate the quality of a function f out of F. Ways of saying how bad/good this function f is doing in classifying real world objects. The choice of a specific mapping function family F (K-NN, SVM, DT, RF, Neural Networks etc.).
  • 285. A machine learning algorithm usually corresponds to a combination of the following 3 elements a way to search for a better function f out of F. How to choose parameters so that the performance of f would improve. Way to evaluate the quality of a function f out of F. Ways of saying how bad/good this function f is doing in classifying real world objects. The choice of a specific mapping function family F (K-NN, SVM, DT, RF, Neural Networks etc.).
  • 287.
  • 288. References • Machine Learning by Andrew Ng (https://www.coursera.org/learn/machine- learning) • Introduction to Machine Learning by Pascal Vincent given at Deep Learning Summer School, Montreal 2015 (http://videolectures.net/ deeplearning2015_vincent_machine_learning/) • Welcome to Machine Learning by Konstantin Tretyakov delivered at AACIMP Summer School 2015 (http://kt.era.ee/lectures/aacimp2015/1-intro.pdf) • Stanford CS class: Convolutional Neural Networks for Visual Recognition by Andrej Karpathy (http://cs231n.github.io/) • Data Mining Course by Jaak Vilo at University of Tartu (https://courses.cs.ut.ee/ MTAT.03.183/2017_spring/uploads/Main/DM_05_Clustering.pdf) • Machine Learning Essential Conepts by Ilya Kuzovkin (https:// www.slideshare.net/iljakuzovkin) • From the brain to deep learning and back by Raul Vicente Zafra and Ilya Kuzovkin (http://www.uttv.ee/naita?id=23585&keel=eng)
  • 290.