SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
2nd edition
#MLSEV 2
Automating Model Selection
Taking (most of) the work out of model tuning
Charles Parker
VP Algorithms, BigML, Inc
#MLSEV 3
Machine Learning for Machine Learning
#MLSEV 4
Parameter Optimization
• There are lots of algorithms and lots of parameters
• We don’t have time to try even close to everything
• If only we had a way to make a prediction . . .
Did I hear someone say
Machine Learning?
#MLSEV 5
The Allure of ML
“Why don’t we just use
machine learning to predict
the quality of a set of
modeling parameters before
we train a model on them?”
— Every first year ML grad student ever
#MLSEV 6
Bayesian Parameter Optimization
• The performance of an ML algorithm (with associated parameters) is data
dependent
• So: Learn from your previous attempts
• Train a model, then evaluate it
• After you’ve done a number of evaluations, learn a regression model to predict the
performance of future, as-yet-untrained models
• Use this classifier to chose a promising set of “next models” to evaluate
#MLSEV 7
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
#MLSEV 8
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
0.75
#MLSEV 9
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
0.75
0.56
#MLSEV 10
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
0.75
0.56
0.92
#MLSEV 11
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
0.75
0.56
0.92
#MLSEV 12
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
0.75
0.56
0.92
Machine Learning!
parameters ⟶ performance
#MLSEV 13
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
0.75
0.56
0.92
Machine Learning!
parameters ⟶ performance
#MLSEV 14
Wow, Magic!
• So all of my problems are solved, right?
NO NO NO
• First, you’re selecting a model based on
held out data so you have to have
enough data to do an accurate model
selection
• Second, there are still important
remaining issues and possible ways to
screw up
#MLSEV 15
Remaining Issue #1: Metric Choice
#MLSEV 16
Driving The Search
• So how do we measure the peformance of
each model, to figure out what to do next?
• If we choose the wrong metric, we’ll get
models that are the best at something that we
don’t really care about
• But there are so many metrics! How do we
choose the right one?
• Hmmmm, all of this sounds awfully familiar . . .
#MLSEV 17
Flashback #1
TP + TN
Total
• “Percentage correct” - like an exam
• If Accuracy = 1 then no mistakes
• If Accuracy = 0 then all mistakes
• Intuitive but not always useful
• Watch out for unbalanced classes!
• Remember, only 1 in 1000 have the disease
• A silly model which always predicts “well” is 99.9% accurate
#MLSEV 18
A Metric Selection Flowchart
Will you bother about
threshold setting?
Is your dataset
imbalanced?
Is yours a
“ranking” problem?
Do you care
more about the top-
ranked instances?
Phi coefficient
f-mesure Accuracy
Max. Phi
KS-statistic
Area Under the ROC / PR curve
Kendall’s Tau
Spearman’s Rho
Yes
Yes
Yes
No
No
No
Yes
No
#MLSEV 19
Ranking Problems
Medical Diagnosis (no) vs. Stock Picking (yes)
#MLSEV 20
Remaining Issue #2: Holdout Choice
#MLSEV 21
Is Cross-Validation Right for You?
• Cross-validation is a good tool some of the time
• Many Other times, it is disastrously bad (overly optimistic)
• This is why BigML offers the option for a specific holdout set.
• Should you use it?
#MLSEV 22
Flashback #2
• Okay, so I’m not testing on the training
data, so I’m good, right? NO NO NO
• You also have to worry about information
leakage between training and test data.
• What is this? Let’s try to predict the daily
closing price of the stock market
• What happens if you hold out 10 random
days from your dataset?
• What if you hold out the last 10 days?
#MLSEV 23
Flashback #3
• This is common when you have time-distributed
data, but can also happen in other instances:
• Let’s say we have a dataset of 10,000 pictures
from 20 people, each labeled with the year it which
it was taken
• We want to predict the year from the image
• What happens if we hold out random data?
• Solution: Hold out users instead
#MLSEV 24
Again, Take Care!
• These situations are very common in all cases
where data comes in groups (days, users, etc.)
• The solution is to hold out whole groups of data
• It’s possible that it isn’t a problem in your
dataset, but when in doubt, try both!
#MLSEV 25
Remaining Issue #3: Model Choice?
#MLSEV 26
Which Model is Best?
• Performance isn’t the only issue!
• Retraining: Will the amount of data you have be different in the future?
• Fit stability: How confident must you be that the model’s behavior is
invariant to small data changes?
• Prediction speed: The difference can be orders of magnitude
#MLSEV 27
Flashback #4
Amount of data required Linear models < trees, ensembles < deep learning
Potential to overfit Linear models < ensembles < trees, deep learning
Speed Linear models, trees < ensembles < deep learning
Representational Power Linear models < trees < ensembles < deep learning
• How much data do you have
• How fast do you need things to go?
• How much performance do you really need?
#MLSEV 28
Modeling Tradeoffs
Interpretability vs. Representability
Weak vs. Slow
Confidence vs. Performance
Biased vs. Data-hungry
Simple
(Logistic)
Complex
(Deepnets)
#MLSEV 29
Summary
• We can do some simple tricks and use
machine learning to help us search
through the space of possible models
• Even with this however, there is still
lots of work domain expert
• Automated model selection relies on
data. If you don’t have enough, it will
go poorly!
MLSEV Virtual. Automating Model Selection

Mais conteúdo relacionado

Mais procurados

Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
Turi, Inc.
 
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Introduction to MaxDiff Scaling of Importance - Parametric Marketing SlidesIntroduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
QuestionPro
 
Carma internet research module sample size considerations
Carma internet research module   sample size considerationsCarma internet research module   sample size considerations
Carma internet research module sample size considerations
Syracuse University
 
Understanding randomness
Understanding randomnessUnderstanding randomness
Understanding randomness
suncil0071
 

Mais procurados (20)

Actionable Machine Learning
Actionable Machine LearningActionable Machine Learning
Actionable Machine Learning
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
DIY Max-Diff webinar slides
DIY Max-Diff webinar slidesDIY Max-Diff webinar slides
DIY Max-Diff webinar slides
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
ML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problemsML Drift - How to find issues before they become problems
ML Drift - How to find issues before they become problems
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatrace
 
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Introduction to MaxDiff Scaling of Importance - Parametric Marketing SlidesIntroduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
 
Ml masterclass
Ml masterclassMl masterclass
Ml masterclass
 
Maximum Difference Statistical Analysis For Determining Political Campaign Me...
Maximum Difference Statistical Analysis For Determining Political Campaign Me...Maximum Difference Statistical Analysis For Determining Political Campaign Me...
Maximum Difference Statistical Analysis For Determining Political Campaign Me...
 
Carma internet research module sample size considerations
Carma internet research module   sample size considerationsCarma internet research module   sample size considerations
Carma internet research module sample size considerations
 
Solutions Manual for Discrete Event System Simulation 5th Edition by Banks
Solutions Manual for Discrete Event System Simulation 5th Edition by BanksSolutions Manual for Discrete Event System Simulation 5th Edition by Banks
Solutions Manual for Discrete Event System Simulation 5th Edition by Banks
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio
 
Lecture 5
Lecture 5Lecture 5
Lecture 5
 
Understanding randomness
Understanding randomnessUnderstanding randomness
Understanding randomness
 
SQLDay2013_MarcinSzeliga_DataInDataMining
SQLDay2013_MarcinSzeliga_DataInDataMiningSQLDay2013_MarcinSzeliga_DataInDataMining
SQLDay2013_MarcinSzeliga_DataInDataMining
 
DutchMLSchool. Machine Learning: Why Now?
DutchMLSchool. Machine Learning: Why Now? DutchMLSchool. Machine Learning: Why Now?
DutchMLSchool. Machine Learning: Why Now?
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Machine Learning for Product Managers
Machine Learning for Product ManagersMachine Learning for Product Managers
Machine Learning for Product Managers
 

Semelhante a MLSEV Virtual. Automating Model Selection

LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
shamsul2010
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
Ajit Ghodke
 

Semelhante a MLSEV Virtual. Automating Model Selection (20)

10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
 
Improving machine learning models unit 5.pptx
Improving machine learning models unit 5.pptxImproving machine learning models unit 5.pptx
Improving machine learning models unit 5.pptx
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
VSSML18. OptiML and Fusions
VSSML18. OptiML and FusionsVSSML18. OptiML and Fusions
VSSML18. OptiML and Fusions
 
MLSEV. Models, Evaluations and Ensembles
MLSEV. Models, Evaluations and Ensembles MLSEV. Models, Evaluations and Ensembles
MLSEV. Models, Evaluations and Ensembles
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational values
 
Predicting the NBA MVP
Predicting the NBA MVPPredicting the NBA MVP
Predicting the NBA MVP
 
DutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and EnsemblesDutchMLSchool. Models, Evaluations, and Ensembles
DutchMLSchool. Models, Evaluations, and Ensembles
 

Mais de BigML, Inc

Mais de BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 

Último

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 

Último (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

MLSEV Virtual. Automating Model Selection

  • 2. #MLSEV 2 Automating Model Selection Taking (most of) the work out of model tuning Charles Parker VP Algorithms, BigML, Inc
  • 3. #MLSEV 3 Machine Learning for Machine Learning
  • 4. #MLSEV 4 Parameter Optimization • There are lots of algorithms and lots of parameters • We don’t have time to try even close to everything • If only we had a way to make a prediction . . . Did I hear someone say Machine Learning?
  • 5. #MLSEV 5 The Allure of ML “Why don’t we just use machine learning to predict the quality of a set of modeling parameters before we train a model on them?” — Every first year ML grad student ever
  • 6. #MLSEV 6 Bayesian Parameter Optimization • The performance of an ML algorithm (with associated parameters) is data dependent • So: Learn from your previous attempts • Train a model, then evaluate it • After you’ve done a number of evaluations, learn a regression model to predict the performance of future, as-yet-untrained models • Use this classifier to chose a promising set of “next models” to evaluate
  • 7. #MLSEV 7 Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization
  • 8. #MLSEV 8 Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization 0.75
  • 9. #MLSEV 9 Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization 0.75 0.56
  • 10. #MLSEV 10 Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization 0.75 0.56 0.92
  • 11. #MLSEV 11 Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization 0.75 0.56 0.92
  • 12. #MLSEV 12 Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization 0.75 0.56 0.92 Machine Learning! parameters ⟶ performance
  • 13. #MLSEV 13 Model and EvaluateParameters 1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization 0.75 0.56 0.92 Machine Learning! parameters ⟶ performance
  • 14. #MLSEV 14 Wow, Magic! • So all of my problems are solved, right? NO NO NO • First, you’re selecting a model based on held out data so you have to have enough data to do an accurate model selection • Second, there are still important remaining issues and possible ways to screw up
  • 15. #MLSEV 15 Remaining Issue #1: Metric Choice
  • 16. #MLSEV 16 Driving The Search • So how do we measure the peformance of each model, to figure out what to do next? • If we choose the wrong metric, we’ll get models that are the best at something that we don’t really care about • But there are so many metrics! How do we choose the right one? • Hmmmm, all of this sounds awfully familiar . . .
  • 17. #MLSEV 17 Flashback #1 TP + TN Total • “Percentage correct” - like an exam • If Accuracy = 1 then no mistakes • If Accuracy = 0 then all mistakes • Intuitive but not always useful • Watch out for unbalanced classes! • Remember, only 1 in 1000 have the disease • A silly model which always predicts “well” is 99.9% accurate
  • 18. #MLSEV 18 A Metric Selection Flowchart Will you bother about threshold setting? Is your dataset imbalanced? Is yours a “ranking” problem? Do you care more about the top- ranked instances? Phi coefficient f-mesure Accuracy Max. Phi KS-statistic Area Under the ROC / PR curve Kendall’s Tau Spearman’s Rho Yes Yes Yes No No No Yes No
  • 19. #MLSEV 19 Ranking Problems Medical Diagnosis (no) vs. Stock Picking (yes)
  • 20. #MLSEV 20 Remaining Issue #2: Holdout Choice
  • 21. #MLSEV 21 Is Cross-Validation Right for You? • Cross-validation is a good tool some of the time • Many Other times, it is disastrously bad (overly optimistic) • This is why BigML offers the option for a specific holdout set. • Should you use it?
  • 22. #MLSEV 22 Flashback #2 • Okay, so I’m not testing on the training data, so I’m good, right? NO NO NO • You also have to worry about information leakage between training and test data. • What is this? Let’s try to predict the daily closing price of the stock market • What happens if you hold out 10 random days from your dataset? • What if you hold out the last 10 days?
  • 23. #MLSEV 23 Flashback #3 • This is common when you have time-distributed data, but can also happen in other instances: • Let’s say we have a dataset of 10,000 pictures from 20 people, each labeled with the year it which it was taken • We want to predict the year from the image • What happens if we hold out random data? • Solution: Hold out users instead
  • 24. #MLSEV 24 Again, Take Care! • These situations are very common in all cases where data comes in groups (days, users, etc.) • The solution is to hold out whole groups of data • It’s possible that it isn’t a problem in your dataset, but when in doubt, try both!
  • 25. #MLSEV 25 Remaining Issue #3: Model Choice?
  • 26. #MLSEV 26 Which Model is Best? • Performance isn’t the only issue! • Retraining: Will the amount of data you have be different in the future? • Fit stability: How confident must you be that the model’s behavior is invariant to small data changes? • Prediction speed: The difference can be orders of magnitude
  • 27. #MLSEV 27 Flashback #4 Amount of data required Linear models < trees, ensembles < deep learning Potential to overfit Linear models < ensembles < trees, deep learning Speed Linear models, trees < ensembles < deep learning Representational Power Linear models < trees < ensembles < deep learning • How much data do you have • How fast do you need things to go? • How much performance do you really need?
  • 28. #MLSEV 28 Modeling Tradeoffs Interpretability vs. Representability Weak vs. Slow Confidence vs. Performance Biased vs. Data-hungry Simple (Logistic) Complex (Deepnets)
  • 29. #MLSEV 29 Summary • We can do some simple tricks and use machine learning to help us search through the space of possible models • Even with this however, there is still lots of work domain expert • Automated model selection relies on data. If you don’t have enough, it will go poorly!