SlideShare uma empresa Scribd logo
1 de 39
Introduction to
Machine Learning
(5 ECTS)
Giovanni Di Liberto
Asst. Prof. in Intelligent Systems, SCSS
Room G.15, O’Reilly Institute
Trinity College Dublin, The University of Dublin
Overview previous lecture
2
• Binary classification
• Evaluation
• Overfitting
• Cross-validation
• Imbalanced datasets
• Multiclass classification
Trinity College Dublin, The University of Dublin
Overview lecture
3
• Classification algorithms
• K-nearest neighbour (KNN)
• Decision tree
• Support Vector Machines (SVM)
• Data projection (introduction)
Trinity College Dublin, The University of Dublin
Binary classification – evaluation metrics
4
Imbalanced datasets:
Dataset
Class
1 0 0 0 0
Binary classification task:
- Is this a number five or not?
- 10 digits
- Each digit with the same number of occurrences in the
dataset
- Ideal chance-level of a multiclass classifier: 1/10 = 0.1 = 10%
(what is the chance of decoding the exact digit)
- Ideal chance-level of a binary classifier (is it a 5 or not?)
- It’s tricky. For example, a classifier that always returns ‘not a
5’ would be 90% correct (as 90% of the digits are not a 5).
So, a good classifier should do better than that. But better in
what? Precision, recall, or both?
Trinity College Dublin, The University of Dublin
Precision vs. recall – confusion matrix
5
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
Prediction Accuracy = (3+5)/(3+5+1+2) = 8/11 ~ 0.73
3 out of 4 of my
predictions were correct.
I made one mistake. I
could have been more
precise!
I detected 3 out of 5
elements. I missed 2 of
them!
Trinity College Dublin, The University of Dublin
Precision vs. recall – confusion matrix
6
Trinity College Dublin, The University of Dublin
Binary classification – evaluation metrics
7
Imbalanced datasets:
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
Trinity College Dublin, The University of Dublin
Binary classification – evaluation metrics
8
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
Trade-off
Trinity College Dublin, The University of Dublin
Binary classification – evaluation metrics
9
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019 ROC: Receiver operating characteristic
Trinity College Dublin, The University of Dublin
Binary classification – evaluation metrics
10
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019 ROC: Receiver operating characteristic
Trinity College Dublin, The University of Dublin
Binary classification – evaluation metrics
11
F1-Score = harmonic mean of precision and recall
Trinity College Dublin, The University of Dublin
Multiclass classification – evaluation metrics
12
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
Actual
class
Predicted class Predicted class
Lots of instances are misclassified as ‘8’
Great classification result!
Trinity College Dublin, The University of Dublin
Baseline – real vs. ideal
13
- Small datasets have a higher chance that a random classifier would get it right
by chance
- So, classification results should be compared to a baseline (or chance level)
that is calculated by taking into account the sample size (N)
https://www.discovermagazine.com/mind/machine-learning-exceeding-chance-level-by-chance
Trinity College Dublin, The University of Dublin
Baseline – real vs. ideal - intuition
14
- N: number of coin tosses; x̄: average number of heads
- Large dataset (N = 10000): 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 …
-> P(‘1’) = 50% -> half of the time we get ‘1’. Small fluctuation, small change in overall balance
between classes (e.g., 50% + 1 = 50.01%)
- Small dataset (N = 10): 0 1 0 1 1 1 1 1 0 0
-> P(‘1’) = 50%, small fluctuation, large change in overall balance between classes (e.g., 50% + 1
= 60%)
https://www.discovermagazine.com/mind/machine-learning-exceeding-chance-level-by-chance
Trinity College Dublin, The University of Dublin
Classification in Python
15
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
X is the data matrix
(features)
y is the class (‘five’ or
‘not a five’)
Trinity College Dublin, The University of Dublin
Types of classification
17
https://machinelearningmastery.com/types-of-classification-in-machine-learning/
Each type may require different methods
Binary Multiclass Imbalanced
e.g., medical diagnosis
Anomaly detection
•Logistic Regression
•k-Nearest Neighbors
•Decision Trees
•Support Vector Machine
•Naive Bayes
•k-Nearest Neighbors
•Decision Trees
•Naive Bayes
•Random Forest
•Gradient Boosting
Binary (one vs. all, one vs. one)
•Support Vector Machine
•Logistic regression
Trinity College Dublin, The University of Dublin 18
K-nearest neighbours (KNN)
https://www.analyticssteps.com/blogs/how-does-k-nearest-neighbor-works-machine-learning-classification-problem
https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn
New instance (Red)
Neighbourhood: k=5:
5 green, 0 blue -> selecting green class
Trinity College Dublin, The University of Dublin 19
K-nearest neighbours (KNN)
https://www.analyticssteps.com/blogs/how-does-k-nearest-neighbor-works-machine-learning-classification-problem
https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn
New instance (Red)
Neighbourhood: k=5:
3 green, 2 blue -> selecting green class
Step 1: labelled data
Step 2: calculate distance
between new instance and k-
nearest neighbours
Step 3: Count! What’s the
most frequent class in the
neighbourhood?
Trinity College Dublin, The University of Dublin 20
K-nearest neighbours (KNN)
Algorithm:
Given a dataset
For each new instance
Find neighbourhood based on feature space
Select most frequent class in the neighbourhood
Pros:
- Simple
- Applies to non-linear data
- There is no need for difficult model fit and tuning
Cons (basic version):
- The model needs to store large amounts of data
- Slow at generating predictions
- Slower and heavier with increasing dataset size
Trinity College Dublin, The University of Dublin 21
K-nearest neighbours (KNN)
Example: Is a bike damaged?
Based on:
- Feature 1: average speed
- Feature 2: how much was it
used in the last 24h
How much was it used (hours)
Average speed
Trinity College Dublin, The University of Dublin 22
K-nearest neighbours (KNN)
Example: Is a bike damaged?
Based on:
- Feature 1: average speed
- Feature 2: how much was it
used in the last 24h
Imbalanced classification
How much was it used (hours)
Average speed
Trinity College Dublin, The University of Dublin 23
Decision tree
How much was it used (hours)
Average speed
Used > 3h
Used > 6h
Avg speed > 15km/h
Yes
No
No
No
Yes
Yes
Trinity College Dublin, The University of Dublin 24
Decision tree
Optimal split at every iteration?
We need to select a metric! -> homogeneity of the target variable in the subsets (e.g., entropy,
information gain)
Trinity College Dublin, The University of Dublin 25
Decision tree
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
Trinity College Dublin, The University of Dublin 26
Decision tree
- Simple to understand and visualise
- It can handle both numerical and categorical data
- It works with little data
- Not great classification results
- Unstable (small changes in the data may result in big changes in the decision tree
- A Random forest runs many decision trees on subsamples of the data. The combination of many
trees leads to better classification results. However, that is a computationally expensive process (it
takes time).
Trinity College Dublin, The University of Dublin 27
Support Vector Machine (SVM)
Linear Binary SVM Classification
- Scenario where the two classes are linearly
separable
- The solid line in the plot on the right represents
the decision boundary of an SVM classifier
- This line separates the two classes + stays as far
away from the closest training instances as
possible
Trinity College Dublin, The University of Dublin 28
Support Vector Machine (SVM)
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
Trinity College Dublin, The University of Dublin 29
Support Vector Machine (SVM)
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
Trinity College Dublin, The University of Dublin 30
Support Vector Machine (SVM)
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
Trinity College Dublin, The University of Dublin 31
Support Vector Machine (SVM)
“Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow”, Aurélien Géron, 2019
Soft-margin classification. It limits the margin violations, but they are indeed possible and
tolerated. How much are they tolerated? That is decided by setting the parameter C.
- Small C: wider margin, lots of data-points between the margins
- Large C: smaller margin with fewer margin violations.
- A very large C would not be good (too specific to this dataset, too sensitive to the
outliers)
Trinity College Dublin, The University of Dublin 32
Support Vector Machine (SVM)
http://www.mlfactor.com/svm.html
Trinity College Dublin, The University of Dublin 33
Support Vector Machine (SVM)
- Some datasets are not even close
to being linearly separable.
- One approach is to use
polynomial features
e.g., x2 = (x1)2
x3 = (x1)3
Trinity College Dublin, The University of Dublin 34
Data projection
x1
x2
Y ∈ {green,blue}
X: [x1, x2]
Xproj = X - [2,0]
Xproj = [x1, x2] - [2,0]
Xproj = [x1-2, x2]
xproj1
xproj2
Trinity College Dublin, The University of Dublin 35
Data projection
x1
x2
Y ∈ {green,blue}
X: [x1, x2]
Xproj = X - [2,3]
Xproj = [x1, x2] - [2,3]
Xproj = [x1-2, x2-3]
xproj1
xproj2
Trinity College Dublin, The University of Dublin 36
Data projection
A projection is a transformation of data points from one axis system to another
x1
x2
xproj1
xproj2
xproj1
xproj2
Trinity College Dublin, The University of Dublin 37
Data projection
x1
x2
x1
x2
Bad projection Good projection
Trinity College Dublin, The University of Dublin 38
x1
x2
Good projection
Data projection
LDA: Linear Discriminant Analysis
Find the axis that:
- Maximises the variance of the class
means (between-class)
- Minimises the within-class variance
Trinity College Dublin, The University of Dublin 39
x1
x2
Good projection
Data projection
xproj
Perfect separability between classes
Trinity College Dublin, The University of Dublin 40
Data projection
x1
x2
Y ∈ {green,blue}
x2
x1
X: [x1, x2] Sometimes it is easier to look at things from a different angle,
instead of searching for a complicated solution

Mais conteúdo relacionado

Semelhante a IntroML_5_Classification_part2

final report (ppt)
final report (ppt)final report (ppt)
final report (ppt)
butest
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
Vimal Gupta
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
Chirag Gupta
 
Association Keynote (March, 2009)
Association Keynote (March, 2009)Association Keynote (March, 2009)
Association Keynote (March, 2009)
Cable Green
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
butest
 

Semelhante a IntroML_5_Classification_part2 (20)

LCBM: Statistics-Based Parallel Collaborative Filtering
LCBM: Statistics-Based Parallel Collaborative FilteringLCBM: Statistics-Based Parallel Collaborative Filtering
LCBM: Statistics-Based Parallel Collaborative Filtering
 
final report (ppt)
final report (ppt)final report (ppt)
final report (ppt)
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
A Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine LearningA Few Useful Things to Know about Machine Learning
A Few Useful Things to Know about Machine Learning
 
Adaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic RiskAdaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic Risk
 
Weitao Duan - Creating economic opportunity for every linkedin member amid ne...
Weitao Duan - Creating economic opportunity for every linkedin member amid ne...Weitao Duan - Creating economic opportunity for every linkedin member amid ne...
Weitao Duan - Creating economic opportunity for every linkedin member amid ne...
 
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
 
Personalizing the web building effective recommender systems
Personalizing the web building effective recommender systemsPersonalizing the web building effective recommender systems
Personalizing the web building effective recommender systems
 
Dialogue system②
Dialogue system②Dialogue system②
Dialogue system②
 
2021_03_26 "Drop-out prediction in online learning environments" - Paola Velardi
2021_03_26 "Drop-out prediction in online learning environments" - Paola Velardi2021_03_26 "Drop-out prediction in online learning environments" - Paola Velardi
2021_03_26 "Drop-out prediction in online learning environments" - Paola Velardi
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
CHI (Computer Human Interaction) 2019 enhancing online problems through instr...
CHI (Computer Human Interaction) 2019 enhancing online problems through instr...CHI (Computer Human Interaction) 2019 enhancing online problems through instr...
CHI (Computer Human Interaction) 2019 enhancing online problems through instr...
 
Di35605610
Di35605610Di35605610
Di35605610
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
Prototype System for Recommending Academic Subjects for Students' Self Design...
Prototype System for Recommending Academic Subjects for Students' Self Design...Prototype System for Recommending Academic Subjects for Students' Self Design...
Prototype System for Recommending Academic Subjects for Students' Self Design...
 
Association Keynote (March, 2009)
Association Keynote (March, 2009)Association Keynote (March, 2009)
Association Keynote (March, 2009)
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 

Último

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 

IntroML_5_Classification_part2

  • 1. Introduction to Machine Learning (5 ECTS) Giovanni Di Liberto Asst. Prof. in Intelligent Systems, SCSS Room G.15, O’Reilly Institute
  • 2. Trinity College Dublin, The University of Dublin Overview previous lecture 2 • Binary classification • Evaluation • Overfitting • Cross-validation • Imbalanced datasets • Multiclass classification
  • 3. Trinity College Dublin, The University of Dublin Overview lecture 3 • Classification algorithms • K-nearest neighbour (KNN) • Decision tree • Support Vector Machines (SVM) • Data projection (introduction)
  • 4. Trinity College Dublin, The University of Dublin Binary classification – evaluation metrics 4 Imbalanced datasets: Dataset Class 1 0 0 0 0 Binary classification task: - Is this a number five or not? - 10 digits - Each digit with the same number of occurrences in the dataset - Ideal chance-level of a multiclass classifier: 1/10 = 0.1 = 10% (what is the chance of decoding the exact digit) - Ideal chance-level of a binary classifier (is it a 5 or not?) - It’s tricky. For example, a classifier that always returns ‘not a 5’ would be 90% correct (as 90% of the digits are not a 5). So, a good classifier should do better than that. But better in what? Precision, recall, or both?
  • 5. Trinity College Dublin, The University of Dublin Precision vs. recall – confusion matrix 5 “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Aurélien Géron, 2019 Prediction Accuracy = (3+5)/(3+5+1+2) = 8/11 ~ 0.73 3 out of 4 of my predictions were correct. I made one mistake. I could have been more precise! I detected 3 out of 5 elements. I missed 2 of them!
  • 6. Trinity College Dublin, The University of Dublin Precision vs. recall – confusion matrix 6
  • 7. Trinity College Dublin, The University of Dublin Binary classification – evaluation metrics 7 Imbalanced datasets: “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Aurélien Géron, 2019
  • 8. Trinity College Dublin, The University of Dublin Binary classification – evaluation metrics 8 “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Aurélien Géron, 2019 Trade-off
  • 9. Trinity College Dublin, The University of Dublin Binary classification – evaluation metrics 9 “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Aurélien Géron, 2019 ROC: Receiver operating characteristic
  • 10. Trinity College Dublin, The University of Dublin Binary classification – evaluation metrics 10 “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Aurélien Géron, 2019 ROC: Receiver operating characteristic
  • 11. Trinity College Dublin, The University of Dublin Binary classification – evaluation metrics 11 F1-Score = harmonic mean of precision and recall
  • 12. Trinity College Dublin, The University of Dublin Multiclass classification – evaluation metrics 12 “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Aurélien Géron, 2019 Actual class Predicted class Predicted class Lots of instances are misclassified as ‘8’ Great classification result!
  • 13. Trinity College Dublin, The University of Dublin Baseline – real vs. ideal 13 - Small datasets have a higher chance that a random classifier would get it right by chance - So, classification results should be compared to a baseline (or chance level) that is calculated by taking into account the sample size (N) https://www.discovermagazine.com/mind/machine-learning-exceeding-chance-level-by-chance
  • 14. Trinity College Dublin, The University of Dublin Baseline – real vs. ideal - intuition 14 - N: number of coin tosses; x̄: average number of heads - Large dataset (N = 10000): 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 … -> P(‘1’) = 50% -> half of the time we get ‘1’. Small fluctuation, small change in overall balance between classes (e.g., 50% + 1 = 50.01%) - Small dataset (N = 10): 0 1 0 1 1 1 1 1 0 0 -> P(‘1’) = 50%, small fluctuation, large change in overall balance between classes (e.g., 50% + 1 = 60%) https://www.discovermagazine.com/mind/machine-learning-exceeding-chance-level-by-chance
  • 15. Trinity College Dublin, The University of Dublin Classification in Python 15 “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Aurélien Géron, 2019 X is the data matrix (features) y is the class (‘five’ or ‘not a five’)
  • 16. Trinity College Dublin, The University of Dublin Types of classification 17 https://machinelearningmastery.com/types-of-classification-in-machine-learning/ Each type may require different methods Binary Multiclass Imbalanced e.g., medical diagnosis Anomaly detection •Logistic Regression •k-Nearest Neighbors •Decision Trees •Support Vector Machine •Naive Bayes •k-Nearest Neighbors •Decision Trees •Naive Bayes •Random Forest •Gradient Boosting Binary (one vs. all, one vs. one) •Support Vector Machine •Logistic regression
  • 17. Trinity College Dublin, The University of Dublin 18 K-nearest neighbours (KNN) https://www.analyticssteps.com/blogs/how-does-k-nearest-neighbor-works-machine-learning-classification-problem https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn New instance (Red) Neighbourhood: k=5: 5 green, 0 blue -> selecting green class
  • 18. Trinity College Dublin, The University of Dublin 19 K-nearest neighbours (KNN) https://www.analyticssteps.com/blogs/how-does-k-nearest-neighbor-works-machine-learning-classification-problem https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn New instance (Red) Neighbourhood: k=5: 3 green, 2 blue -> selecting green class Step 1: labelled data Step 2: calculate distance between new instance and k- nearest neighbours Step 3: Count! What’s the most frequent class in the neighbourhood?
  • 19. Trinity College Dublin, The University of Dublin 20 K-nearest neighbours (KNN) Algorithm: Given a dataset For each new instance Find neighbourhood based on feature space Select most frequent class in the neighbourhood Pros: - Simple - Applies to non-linear data - There is no need for difficult model fit and tuning Cons (basic version): - The model needs to store large amounts of data - Slow at generating predictions - Slower and heavier with increasing dataset size
  • 20. Trinity College Dublin, The University of Dublin 21 K-nearest neighbours (KNN) Example: Is a bike damaged? Based on: - Feature 1: average speed - Feature 2: how much was it used in the last 24h How much was it used (hours) Average speed
  • 21. Trinity College Dublin, The University of Dublin 22 K-nearest neighbours (KNN) Example: Is a bike damaged? Based on: - Feature 1: average speed - Feature 2: how much was it used in the last 24h Imbalanced classification How much was it used (hours) Average speed
  • 22. Trinity College Dublin, The University of Dublin 23 Decision tree How much was it used (hours) Average speed Used > 3h Used > 6h Avg speed > 15km/h Yes No No No Yes Yes
  • 23. Trinity College Dublin, The University of Dublin 24 Decision tree Optimal split at every iteration? We need to select a metric! -> homogeneity of the target variable in the subsets (e.g., entropy, information gain)
  • 24. Trinity College Dublin, The University of Dublin 25 Decision tree “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Aurélien Géron, 2019
  • 25. Trinity College Dublin, The University of Dublin 26 Decision tree - Simple to understand and visualise - It can handle both numerical and categorical data - It works with little data - Not great classification results - Unstable (small changes in the data may result in big changes in the decision tree - A Random forest runs many decision trees on subsamples of the data. The combination of many trees leads to better classification results. However, that is a computationally expensive process (it takes time).
  • 26. Trinity College Dublin, The University of Dublin 27 Support Vector Machine (SVM) Linear Binary SVM Classification - Scenario where the two classes are linearly separable - The solid line in the plot on the right represents the decision boundary of an SVM classifier - This line separates the two classes + stays as far away from the closest training instances as possible
  • 27. Trinity College Dublin, The University of Dublin 28 Support Vector Machine (SVM) “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Aurélien Géron, 2019
  • 28. Trinity College Dublin, The University of Dublin 29 Support Vector Machine (SVM) “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Aurélien Géron, 2019
  • 29. Trinity College Dublin, The University of Dublin 30 Support Vector Machine (SVM) “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Aurélien Géron, 2019
  • 30. Trinity College Dublin, The University of Dublin 31 Support Vector Machine (SVM) “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Aurélien Géron, 2019 Soft-margin classification. It limits the margin violations, but they are indeed possible and tolerated. How much are they tolerated? That is decided by setting the parameter C. - Small C: wider margin, lots of data-points between the margins - Large C: smaller margin with fewer margin violations. - A very large C would not be good (too specific to this dataset, too sensitive to the outliers)
  • 31. Trinity College Dublin, The University of Dublin 32 Support Vector Machine (SVM) http://www.mlfactor.com/svm.html
  • 32. Trinity College Dublin, The University of Dublin 33 Support Vector Machine (SVM) - Some datasets are not even close to being linearly separable. - One approach is to use polynomial features e.g., x2 = (x1)2 x3 = (x1)3
  • 33. Trinity College Dublin, The University of Dublin 34 Data projection x1 x2 Y ∈ {green,blue} X: [x1, x2] Xproj = X - [2,0] Xproj = [x1, x2] - [2,0] Xproj = [x1-2, x2] xproj1 xproj2
  • 34. Trinity College Dublin, The University of Dublin 35 Data projection x1 x2 Y ∈ {green,blue} X: [x1, x2] Xproj = X - [2,3] Xproj = [x1, x2] - [2,3] Xproj = [x1-2, x2-3] xproj1 xproj2
  • 35. Trinity College Dublin, The University of Dublin 36 Data projection A projection is a transformation of data points from one axis system to another x1 x2 xproj1 xproj2 xproj1 xproj2
  • 36. Trinity College Dublin, The University of Dublin 37 Data projection x1 x2 x1 x2 Bad projection Good projection
  • 37. Trinity College Dublin, The University of Dublin 38 x1 x2 Good projection Data projection LDA: Linear Discriminant Analysis Find the axis that: - Maximises the variance of the class means (between-class) - Minimises the within-class variance
  • 38. Trinity College Dublin, The University of Dublin 39 x1 x2 Good projection Data projection xproj Perfect separability between classes
  • 39. Trinity College Dublin, The University of Dublin 40 Data projection x1 x2 Y ∈ {green,blue} x2 x1 X: [x1, x2] Sometimes it is easier to look at things from a different angle, instead of searching for a complicated solution

Notas do Editor

  1. Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,
  2. Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,
  3. Algorithms for constructing decision trees usually work top-down, by choosing a variable at each step that best splits the set of items. Different metrics could be used to define what “best” means, such as information gain (entropy)
  4. Algorithms for constructing decision trees usually work top-down, by choosing a variable at each step that best splits the set of items. Different metrics could be used to define what “best” means, such as information gain (entropy)
  5. Algorithms for constructing decision trees usually work top-down, by choosing a variable at each step that best splits the set of items. Different metrics could be used to define what “best” means, such as information gain (entropy)
  6. Algorithms for constructing decision trees usually work top-down, by choosing a variable at each step that best splits the set of items. Different metrics could be used to define what “best” means, such as information gain (entropy)
  7. Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,
  8. Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,
  9. Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,
  10. Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,
  11. Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,
  12. Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,
  13. Mention that the main challenge is always to determine those axes (features). Not just 2D, multidimensional. It could be age, height,