This session will show how to quickly implement a Machine Learning model using Azure Automated ML and the Python SDK. In addition, the new toolkits developed by Microsoft that allow to easily evaluate both the performance of the prototyped model and to explain its behavior to executives and stakeholders will be shown during the demo.
(https://datasaturdays.com/events/datasaturday0001.html)
Unleashing the Power of Machine Learning with Azure AutoML
1. #datasatpn
February 27th, 2021
Data Saturday #1
Unleashing the Power of Machine Learning
Prototyping using Azure AutoML and Python
Luca Zavarella
2. Who Am I
Luca Zavarella
Working with SQL Server since 2007 (BI)
Microsoft MVP for Artificial Intelligence
Microsoft Certified: Azure Data Scientist Associate
DAMAG Founder, ODSC Ambassador
AI & ML Practice Director @
Email: lzavarella@lucient.com
Twitter: @lucazav
LinkedIn: http://it.linkedin.com/in/lucazavarella
Blog: medium.com/@lucazav
3. Agenda
• ML Prototyping
• Machine Learning Process
• Azure AutoML
• Overview
• Validation Types
• Algorithms
• Featurization
• Data Guardrails
• Demo
• Conclusion
6. Steps To Build a Machine Learning Solution
1
Problem
Framing
2
Get/Prepare
Data
3
Develop
Model
4
Deploy
Model
5
Evaluate /
Track
Performance 3.1
Analysis/
Metric
definition
3.2
Feature
Engineering
3.3
Model
Training
3.4
Parameter
Tuning
3.5
Evaluation
7. What a Machine Learning Project Really Is
A Machine Learning project can be viewed as…
…a research and development activity
…later transformed into a Data Engineering project.
Icons by Vectors Market from the Noun Project
In short, a complete ML project could take months to implement!
8. Quoting a ML Project
Data Exploration
You…
Feature Selection
Customer
Could you quote
for a ML Proof
Of Concept?
9. The Suggested POC Process
You
Customer
STEP 1
1. Define target
2. Understand what data
is available
3. Define the schema of
the input ML dataset
2-3 days
STEP 2
1. Collect all the data
2. Clean the data
3. Provide data with the
defined schema
X days
STEP 3
1. All the ML magic
stuff!
2-3 days
STEP 4
1. Documentation
2. Presentation of
results
1-2 days
Fast tool for
prototyping
17. Validation Types: Auto
• For datasets larger than 20,000 rows, the 10% of the initial training data
is taken as the validation set. In turn, that validation set is used for
metrics calculation.
• For datasets smaller than 20,000 rows, the cross-validation approach is
applied: 10 folds will be used if the dataset is less than 1000 rows; 3
folds otherwise.
22. Ensembling Models: Voting
Hard Voting. Predict the class with the
largest sum of votes from models
Soft Voting. Predict the class with the
largest summed probability from models.
Classification
The prediction that is the average
of the prediction of base regressors
Regression
28. Data Guardrails in the UI
Highly imbalanced: ratio of the
samples in the least populated
class to the samples in the most
populated class is less than 20%
32. Azure AutoML Strengths
• A Python-based technology
• Easy integration with your custom pipelines
• Data normalizations and basic transformations included
• Complex featurization included
• Also NLP feature engineering
• Ensemble models out of the box
• Automatic model explanations
33. Future of Azure AutoML
• Only basic imputers implemented
• Stratified cross-validation not implemented out of the box
• Highly imbalanced datasets not automatically fixed
• Explicit feature selection step missing
• Neural Networks still not included in training algorithms
• Except for ForecastTCN for time-series forecasting
34. References
• Probabilistic Matrix Factorization for Automated Machine Learning
(https://arxiv.org/abs/1705.05355)
• What is automated machine learning (AutoML)?
(https://docs.microsoft.com/en-us/azure/machine-learning/concept-
automated-ml)
• A Review of Azure Automated Machine Learning (AutoML)
(https://medium.com/microsoftazure/a-review-of-azure-automated-
machine-learning-automl-5d2f98512406 )