Supervised learning

ML Fundamentals: Session 2
Supervised Learning with scikit-learn
Alia Hamwi
What is ML?
• “Machine Learning: Field of study that gives
computers the ability to learn without being
explicitly programmed.” -Arthur Samuel (1959)
Traditional Programming .vs. Machine Learning
When Do We Use Machine Learning?
• ML is used when:
• Humans can’t explain their expertise (speech recognition)
• Models are based on huge amounts of data (genomics)
• Learning isn’t always useful:
• There is no need to “learn” to calculate payroll
When Do We Use Machine Learning?
• A classic example of a task that requires machine learning:
It is very hard to say what makes a 2
Types of Learning
• Supervised (inductive) learning
- Given: training data + desired outputs (labels)
Types of Learning
• Unsupervised learning
-Given: training data (without desired outputs)
Types of Learning
• Semi-supervised learning
-Given: training data + a few desired outputs
Types of Learning
• Reinforcement learning
-Rewards from sequence of actions
Types of Supervised learning
• Classification: A classification problem is when the output variable is
a category, such as “red” or “blue” or “disease” and “no disease”.
• Regression: A regression problem is when the output variable is a real
value, such as “dollars” or “weight”.
Supervised learning Applications
• Text categorization (News)
• Face Recognition / Object Recognition/Signature recognition
• Type of Music ( for recommendation-Spotify)
• Spam detection (Gmail)
• Weather forecasting (weatherForcast)
• Predicting housing prices
• Stock price predictions, among others
• Predict product price depend on attributes
• Predict if employee will leave your company (HR system)
As an ML Engineer..
• Now, Choose the right answers for these use cases:
https://forms.gle/zDfcQuxX22UfjUUc6
ML Pipeline
Data Collection
- Row: examples (instances)
- Column: features + one for target/label
- Values:
-Numeric Data
-Ordinal Data
The categories have an inherent order
-Nominal Data
The categories do not have an inherent order
Data Collection
Data Preparation
• Data Cleaning
• Remove unwanted data content
• Check formatting
• Imputation/Handle missing data
• Numerical: mean, median
• Categorical: most frequent, add new Missing category
• Both: drop example
Data Preparation: Encoding
• One Hot Encoding/Dummy variables
• for each level of a categorical feature, we create a new variable. Each
category is mapped with a binary variable containing either 0 or 1. Here, 0
represents the absence, and 1 represents the presence of that category.
Data Preparation: Encoding
• Label Encoding/ordinal encoding
• We use this categorical data encoding technique when the categorical feature
is ordinal. In this case, retaining the order is important. Hence encoding
should reflect the sequence.(exam grade, day of week,sizes)
• Ex: ‘Degree':{'None':0,'High school':1,'Diploma':2,'Bachelors':3,'Masters':4,'phd':5}
Data Preparation:
• Standardization
• Standardization is a process that deals with the mean and standard deviation
of the data points. As raw data, the values are varying from very low to very
high. So, to avoid the low performance in the model we use standardization.
It says, the mean becomes zero and the standard deviation becomes a unit.
• The formula to standardization shown below:
z = (feature_value — mean)/standard deviation
Model Training
• Classification:
• Logistic regression
• K nearest neighbors
• Support vector classification (SVM)
• Naïve-Bayes
• Regression
• Linear regression with different regularization:
• Lasso
• Ridge
• Elastic
Model Training
• Cross validation
Model Evaluation
• Overfitting
• Increasing the model complexity
• Reducing regularization
• Adding features to training data
• Underfitting
• Adding more data
• Data augmentation
• Regularization
• Removing features from data
As an ML Engineer..
• Now, Choose the right answers for these use cases:
https://forms.gle/fN2y2nRueviBf2JX6
Model Evaluation
• Confusion matrix
Model Evaluation
• Precision explains how many correctly predicted values came out to be positive
actually. Or simply it gives the number of correct outputs given by the model out of
all the correctly predicted positive values by the model. Like music or video
recommendation systems, e-commerce websites, etc. Wrong results could lead to
customer churn and be harmful to the business.
• It determines whether a model is reliable or not. It is useful for the conditions
where false positive is a higher concern as compared to a false negative.
Model Evaluation
• Recall describes how many of the actual positive values to be predicted correctly
out of the model.
• Recall /Sensitivity is a useful metric in cases where False Negative trumps False
Positive. Recall is important in medical cases where it doesn’t matter whether we
raise a false alarm but the actual positive cases should not go undetected!
Model Evaluation
• Increasing precision decreases recall and vice versa, this is known as the
precision/recall tradeoff.
• For the condition when two models have low precision and high recall or vice versa,
it becomes hard to compare those models, therefore to solve this issue we can
deploy F-score.
• Also, if the recall is equal to precision, The F-score is maximum and can be
calculated using the below formula:
References
• Best Competitions for Beiggienrs – kaggle
https://www.kaggle.com/getting-started/78482
• The Hundred-Page Machine Learning Book
• Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, Tools, and Techniques to Build Intelligent Systems
Thank You
1 de 29

Mais conteúdo relacionado

Mais procurados(20)

  Supervised learning  Supervised learning
Supervised learning
Learnbay Datascience644 visualizações
Machine learning overviewMachine learning overview
Machine learning overview
prih_yah511 visualizações
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
Rui Pedro Paiva5.6K visualizações
Supervised Machine Learning TechniquesSupervised Machine Learning Techniques
Supervised Machine Learning Techniques
Tara ram Goyal514 visualizações
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
Babu Priyavrat2.5K visualizações
Lecture 9 PerceptronLecture 9 Perceptron
Lecture 9 Perceptron
Marina Santini4K visualizações
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony9.8K visualizações
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
Knoldus Inc.5.6K visualizações
Classification and RegressionClassification and Regression
Classification and Regression
Megha Sharma1.9K visualizações
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan5K visualizações
Machine LearningMachine Learning
Machine Learning
Bhupender Sharma1.8K visualizações
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
Joel Graff2.5K visualizações
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan6.7K visualizações
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
Usama Fayyaz4.9K visualizações
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
SwatiTripathi442K visualizações
Applications of Machine LearningApplications of Machine Learning
Applications of Machine Learning
Hayim Makabee9.4K visualizações
Feedforward neural networkFeedforward neural network
Feedforward neural network
Sopheaktra YONG9.4K visualizações

Similar a Supervised learning

The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
10.5K visualizações15 slides
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine LearningSharjeel Imtiaz
314 visualizações44 slides
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
244 visualizações125 slides
Machine Learning SPPU Unit 1Machine Learning SPPU Unit 1
Machine Learning SPPU Unit 1Amruta Aphale
1.1K visualizações37 slides

Similar a Supervised learning(20)

The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
Marc Berman10.5K visualizações
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
kprasad818 visualizações
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine Learning
Sharjeel Imtiaz314 visualizações
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
Roger Barga244 visualizações
Machine Learning SPPU Unit 1Machine Learning SPPU Unit 1
Machine Learning SPPU Unit 1
Amruta Aphale1.1K visualizações
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati1.5K visualizações
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
Chitrachitrap15 visualizações
Introduction to data science.pdfIntroduction to data science.pdf
Introduction to data science.pdf
alsaid fathy310 visualizações
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
Shishir Choudhary930 visualizações
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
台灣資料科學年會7.8K visualizações
ClassificationClassification
Classification
thamizh arasi124 visualizações
Knowledge Discovery Process In Data MiningKnowledge Discovery Process In Data Mining
Knowledge Discovery Process In Data Mining
SaNju BuggargaNi265 visualizações
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
Er. Nawaraj Bhandari324 visualizações
Modeling for the Non-StatisticianModeling for the Non-Statistician
Modeling for the Non-Statistician
Andrew Curtis839 visualizações
4.1.pptx4.1.pptx
4.1.pptx
LimitlessHorizons2 visualizações
Kevin Swingler: Introduction to Data MiningKevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data Mining
Library and Information Science Research Coalition1.5K visualizações

Mais de Alia Hamwi

Teens In AI-Alia.pptxTeens In AI-Alia.pptx
Teens In AI-Alia.pptxAlia Hamwi
75 visualizações24 slides
Unsupervised LearningUnsupervised Learning
Unsupervised LearningAlia Hamwi
130 visualizações15 slides
Protein Remote Homology DetectionProtein Remote Homology Detection
Protein Remote Homology DetectionAlia Hamwi
80 visualizações14 slides
Model-driven architecture (MDA) Model-driven architecture (MDA)
Model-driven architecture (MDA) Alia Hamwi
187 visualizações15 slides

Mais de Alia Hamwi(12)

Teens In AI-Alia.pptxTeens In AI-Alia.pptx
Teens In AI-Alia.pptx
Alia Hamwi75 visualizações
Unsupervised LearningUnsupervised Learning
Unsupervised Learning
Alia Hamwi130 visualizações
Protein Remote Homology DetectionProtein Remote Homology Detection
Protein Remote Homology Detection
Alia Hamwi80 visualizações
Model-driven architecture (MDA) Model-driven architecture (MDA)
Model-driven architecture (MDA)
Alia Hamwi187 visualizações
Intelligent alert system for the driverIntelligent alert system for the driver
Intelligent alert system for the driver
Alia Hamwi20 visualizações
Social Network AnalysisSocial Network Analysis
Social Network Analysis
Alia Hamwi70 visualizações
Optical Neural NetworkOptical Neural Network
Optical Neural Network
Alia Hamwi91 visualizações
Human vs machineHuman vs machine
Human vs machine
Alia Hamwi154 visualizações
Introduction To Robotics ChallengesIntroduction To Robotics Challenges
Introduction To Robotics Challenges
Alia Hamwi70 visualizações
Design Pattern (Strategy & Template)Design Pattern (Strategy & Template)
Design Pattern (Strategy & Template)
Alia Hamwi148 visualizações

Último(20)

.conf Go 2023 - SIEM project @ SNF.conf Go 2023 - SIEM project @ SNF
.conf Go 2023 - SIEM project @ SNF
Splunk163 visualizações
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh34 visualizações
Business Analyst Series 2023 -  Week 2 Session 3Business Analyst Series 2023 -  Week 2 Session 3
Business Analyst Series 2023 - Week 2 Session 3
DianaGray10307 visualizações
PyCon ID 2023 - Ridwan Fadjar Septian.pdfPyCon ID 2023 - Ridwan Fadjar Septian.pdf
PyCon ID 2023 - Ridwan Fadjar Septian.pdf
Ridwan Fadjar163 visualizações
MemVerge: Past Present and Future of CXLMemVerge: Past Present and Future of CXL
MemVerge: Past Present and Future of CXL
CXL Forum105 visualizações
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation23 visualizações
TE Connectivity: Card Edge InterconnectsTE Connectivity: Card Edge Interconnects
TE Connectivity: Card Edge Interconnects
CXL Forum93 visualizações
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray1042 visualizações
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture update
CXL Forum23 visualizações
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman20 visualizações
AMD: 4th Generation EPYC CXL DemoAMD: 4th Generation EPYC CXL Demo
AMD: 4th Generation EPYC CXL Demo
CXL Forum117 visualizações
MemVerge: Memory Viewer SoftwareMemVerge: Memory Viewer Software
MemVerge: Memory Viewer Software
CXL Forum115 visualizações
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver23 visualizações
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang34 visualizações

Supervised learning

  • 1. ML Fundamentals: Session 2 Supervised Learning with scikit-learn Alia Hamwi
  • 2. What is ML? • “Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.” -Arthur Samuel (1959)
  • 3. Traditional Programming .vs. Machine Learning
  • 4. When Do We Use Machine Learning? • ML is used when: • Humans can’t explain their expertise (speech recognition) • Models are based on huge amounts of data (genomics) • Learning isn’t always useful: • There is no need to “learn” to calculate payroll
  • 5. When Do We Use Machine Learning? • A classic example of a task that requires machine learning: It is very hard to say what makes a 2
  • 6. Types of Learning • Supervised (inductive) learning - Given: training data + desired outputs (labels)
  • 7. Types of Learning • Unsupervised learning -Given: training data (without desired outputs)
  • 8. Types of Learning • Semi-supervised learning -Given: training data + a few desired outputs
  • 9. Types of Learning • Reinforcement learning -Rewards from sequence of actions
  • 10. Types of Supervised learning • Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”. • Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.
  • 11. Supervised learning Applications • Text categorization (News) • Face Recognition / Object Recognition/Signature recognition • Type of Music ( for recommendation-Spotify) • Spam detection (Gmail) • Weather forecasting (weatherForcast) • Predicting housing prices • Stock price predictions, among others • Predict product price depend on attributes • Predict if employee will leave your company (HR system)
  • 12. As an ML Engineer.. • Now, Choose the right answers for these use cases: https://forms.gle/zDfcQuxX22UfjUUc6
  • 14. Data Collection - Row: examples (instances) - Column: features + one for target/label - Values: -Numeric Data -Ordinal Data The categories have an inherent order -Nominal Data The categories do not have an inherent order
  • 16. Data Preparation • Data Cleaning • Remove unwanted data content • Check formatting • Imputation/Handle missing data • Numerical: mean, median • Categorical: most frequent, add new Missing category • Both: drop example
  • 17. Data Preparation: Encoding • One Hot Encoding/Dummy variables • for each level of a categorical feature, we create a new variable. Each category is mapped with a binary variable containing either 0 or 1. Here, 0 represents the absence, and 1 represents the presence of that category.
  • 18. Data Preparation: Encoding • Label Encoding/ordinal encoding • We use this categorical data encoding technique when the categorical feature is ordinal. In this case, retaining the order is important. Hence encoding should reflect the sequence.(exam grade, day of week,sizes) • Ex: ‘Degree':{'None':0,'High school':1,'Diploma':2,'Bachelors':3,'Masters':4,'phd':5}
  • 19. Data Preparation: • Standardization • Standardization is a process that deals with the mean and standard deviation of the data points. As raw data, the values are varying from very low to very high. So, to avoid the low performance in the model we use standardization. It says, the mean becomes zero and the standard deviation becomes a unit. • The formula to standardization shown below: z = (feature_value — mean)/standard deviation
  • 20. Model Training • Classification: • Logistic regression • K nearest neighbors • Support vector classification (SVM) • Naïve-Bayes • Regression • Linear regression with different regularization: • Lasso • Ridge • Elastic
  • 22. Model Evaluation • Overfitting • Increasing the model complexity • Reducing regularization • Adding features to training data • Underfitting • Adding more data • Data augmentation • Regularization • Removing features from data
  • 23. As an ML Engineer.. • Now, Choose the right answers for these use cases: https://forms.gle/fN2y2nRueviBf2JX6
  • 25. Model Evaluation • Precision explains how many correctly predicted values came out to be positive actually. Or simply it gives the number of correct outputs given by the model out of all the correctly predicted positive values by the model. Like music or video recommendation systems, e-commerce websites, etc. Wrong results could lead to customer churn and be harmful to the business. • It determines whether a model is reliable or not. It is useful for the conditions where false positive is a higher concern as compared to a false negative.
  • 26. Model Evaluation • Recall describes how many of the actual positive values to be predicted correctly out of the model. • Recall /Sensitivity is a useful metric in cases where False Negative trumps False Positive. Recall is important in medical cases where it doesn’t matter whether we raise a false alarm but the actual positive cases should not go undetected!
  • 27. Model Evaluation • Increasing precision decreases recall and vice versa, this is known as the precision/recall tradeoff. • For the condition when two models have low precision and high recall or vice versa, it becomes hard to compare those models, therefore to solve this issue we can deploy F-score. • Also, if the recall is equal to precision, The F-score is maximum and can be calculated using the below formula:
  • 28. References • Best Competitions for Beiggienrs – kaggle https://www.kaggle.com/getting-started/78482 • The Hundred-Page Machine Learning Book • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems