SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
#datasatpn
February 27th, 2021
Data Saturday #1
Unleashing the Power of Machine Learning
Prototyping using Azure AutoML and Python
Luca Zavarella
Who Am I
Luca Zavarella
Working with SQL Server since 2007 (BI)
Microsoft MVP for Artificial Intelligence
Microsoft Certified: Azure Data Scientist Associate
DAMAG Founder, ODSC Ambassador
AI & ML Practice Director @
Email: lzavarella@lucient.com
Twitter: @lucazav
LinkedIn: http://it.linkedin.com/in/lucazavarella
Blog: medium.com/@lucazav
Agenda
• ML Prototyping
• Machine Learning Process
• Azure AutoML
• Overview
• Validation Types
• Algorithms
• Featurization
• Data Guardrails
• Demo
• Conclusion
Machine Learning Process
Components of a BI Project
ONE WAY
Steps To Build a Machine Learning Solution
1
Problem
Framing
2
Get/Prepare
Data
3
Develop
Model
4
Deploy
Model
5
Evaluate /
Track
Performance 3.1
Analysis/
Metric
definition
3.2
Feature
Engineering
3.3
Model
Training
3.4
Parameter
Tuning
3.5
Evaluation
What a Machine Learning Project Really Is
A Machine Learning project can be viewed as…
…a research and development activity
…later transformed into a Data Engineering project.
Icons by Vectors Market from the Noun Project
In short, a complete ML project could take months to implement!
Quoting a ML Project
Data Exploration
You…
Feature Selection
Customer
Could you quote
for a ML Proof
Of Concept?
The Suggested POC Process
You
Customer
STEP 1
1. Define target
2. Understand what data
is available
3. Define the schema of
the input ML dataset
2-3 days
STEP 2
1. Collect all the data
2. Clean the data
3. Provide data with the
defined schema
X days
STEP 3
1. All the ML magic
stuff!
2-3 days
STEP 4
1. Documentation
2. Presentation of
results
1-2 days
Fast tool for
prototyping
AutoML Overview
Azure AutoML Workflow
AutoML Task Types
Primary Metrics
AutoML Validation Types
Validation Types: Train-Validation Split
Validation Types: KFold vs MonteCarlo
Validation Types: Auto
• For datasets larger than 20,000 rows, the 10% of the initial training data
is taken as the validation set. In turn, that validation set is used for
metrics calculation.
• For datasets smaller than 20,000 rows, the cross-validation approach is
applied: 10 folds will be used if the dataset is less than 1000 rows; 3
folds otherwise.
Validation Types: Rolling Origin Cross-Validation
Only for time-series forecasting
AutoML Algorithms
Supported Algorithms
Ensembling Models: Stacking
Ensembling Models: Voting
Hard Voting. Predict the class with the
largest sum of votes from models
Soft Voting. Predict the class with the
largest summed probability from models.
Classification
The prediction that is the average
of the prediction of base regressors
Regression
Ensembling Models in AutoML
Featurization in AutoML
Featurization: Scaling and Normalization
Featurization: Feature Engineering
Data Guardrails in AutoML
Data Guardrails in the UI
Highly imbalanced: ratio of the
samples in the least populated
class to the samples in the most
populated class is less than 20%
Data Guardrails Using Python SDK
DEMO
Auto Insurance Claims Data
Conclusion
Azure AutoML Strengths
• A Python-based technology
• Easy integration with your custom pipelines
• Data normalizations and basic transformations included
• Complex featurization included
• Also NLP feature engineering
• Ensemble models out of the box
• Automatic model explanations
Future of Azure AutoML
• Only basic imputers implemented
• Stratified cross-validation not implemented out of the box
• Highly imbalanced datasets not automatically fixed
• Explicit feature selection step missing
• Neural Networks still not included in training algorithms
• Except for ForecastTCN for time-series forecasting
References
• Probabilistic Matrix Factorization for Automated Machine Learning
(https://arxiv.org/abs/1705.05355)
• What is automated machine learning (AutoML)?
(https://docs.microsoft.com/en-us/azure/machine-learning/concept-
automated-ml)
• A Review of Azure Automated Machine Learning (AutoML)
(https://medium.com/microsoftazure/a-review-of-azure-automated-
machine-learning-automl-5d2f98512406 )
Thank you!

Mais conteúdo relacionado

Mais procurados

Introduction to Azure machine learning
Introduction to Azure machine learningIntroduction to Azure machine learning
Introduction to Azure machine learningJasjit Chopra
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101QuantUniversity
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0Mark Tabladillo
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...Athens Big Data
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101Andrew Badera
 
AI with Azure Machine Learning
AI with Azure Machine LearningAI with Azure Machine Learning
AI with Azure Machine LearningGeert Baeke
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Mark Tabladillo
 
Alex mang patterns for scalability in microsoft azure application
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure applicationCodecamp Romania
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101Renato Jovic
 
Automate your Machine Learning
Automate your Machine LearningAutomate your Machine Learning
Automate your Machine LearningAjit Ananthram
 
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...Simplilearn
 
Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineersCameron Joannidis
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructurejoshwills
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingDatabricks
 
Microsoft Machine Learning Server. Architecture View
Microsoft Machine Learning Server. Architecture ViewMicrosoft Machine Learning Server. Architecture View
Microsoft Machine Learning Server. Architecture ViewDmitry Petukhov
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMatei Zaharia
 
Rest microservice ml_deployment_ntalagala_ai_conf_2019
Rest microservice ml_deployment_ntalagala_ai_conf_2019Rest microservice ml_deployment_ntalagala_ai_conf_2019
Rest microservice ml_deployment_ntalagala_ai_conf_2019Nisha Talagala
 
Linear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actionsLinear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actionsHesen Peng
 

Mais procurados (20)

Introduction to Azure machine learning
Introduction to Azure machine learningIntroduction to Azure machine learning
Introduction to Azure machine learning
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
 
Cerrera DINWC2015
Cerrera DINWC2015Cerrera DINWC2015
Cerrera DINWC2015
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101
 
AI with Azure Machine Learning
AI with Azure Machine LearningAI with Azure Machine Learning
AI with Azure Machine Learning
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
 
Alex mang patterns for scalability in microsoft azure application
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure application
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101
 
Automate your Machine Learning
Automate your Machine LearningAutomate your Machine Learning
Automate your Machine Learning
 
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
 
Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineers
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
 
Microsoft Machine Learning Server. Architecture View
Microsoft Machine Learning Server. Architecture ViewMicrosoft Machine Learning Server. Architecture View
Microsoft Machine Learning Server. Architecture View
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine Learning
 
Rest microservice ml_deployment_ntalagala_ai_conf_2019
Rest microservice ml_deployment_ntalagala_ai_conf_2019Rest microservice ml_deployment_ntalagala_ai_conf_2019
Rest microservice ml_deployment_ntalagala_ai_conf_2019
 
Linear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actionsLinear regression on 1 terabytes of data? Some crazy observations and actions
Linear regression on 1 terabytes of data? Some crazy observations and actions
 

Semelhante a Unleashing the Power of Machine Learning with Azure AutoML

Getting Started with Azure AutoML
Getting Started with Azure AutoMLGetting Started with Azure AutoML
Getting Started with Azure AutoMLVivek Raja P S
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedLaurenz Wuttke
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?Matei Zaharia
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for DevelopersMark Tabladillo
 
Introduction to Machine learning and Deep Learning
Introduction to Machine learning and Deep LearningIntroduction to Machine learning and Deep Learning
Introduction to Machine learning and Deep LearningNishan Aryal
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Marco Zamana
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
Practical data science
Practical data sciencePractical data science
Practical data scienceDing Li
 
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in PracticeGDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in PracticeJames Anderson
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksAlberto Diaz Martin
 
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Databricks
 
Serverless Machine Learning
Serverless Machine LearningServerless Machine Learning
Serverless Machine LearningAsavari Tayal
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
How I became ML Engineer
How I became ML Engineer How I became ML Engineer
How I became ML Engineer Kevin Lee
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAlberto Diaz Martin
 

Semelhante a Unleashing the Power of Machine Learning with Azure AutoML (20)

Machine learning
Machine learningMachine learning
Machine learning
 
Aws autopilot
Aws autopilotAws autopilot
Aws autopilot
 
Getting Started with Azure AutoML
Getting Started with Azure AutoMLGetting Started with Azure AutoML
Getting Started with Azure AutoML
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
Introduction to Machine learning and Deep Learning
Introduction to Machine learning and Deep LearningIntroduction to Machine learning and Deep Learning
Introduction to Machine learning and Deep Learning
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019Automated machine learning - Global AI night 2019
Automated machine learning - Global AI night 2019
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in PracticeGDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
 
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
 
Serverless Machine Learning
Serverless Machine LearningServerless Machine Learning
Serverless Machine Learning
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
How I became ML Engineer
How I became ML Engineer How I became ML Engineer
How I became ML Engineer
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
 

Último

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Último (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Unleashing the Power of Machine Learning with Azure AutoML

  • 1. #datasatpn February 27th, 2021 Data Saturday #1 Unleashing the Power of Machine Learning Prototyping using Azure AutoML and Python Luca Zavarella
  • 2. Who Am I Luca Zavarella Working with SQL Server since 2007 (BI) Microsoft MVP for Artificial Intelligence Microsoft Certified: Azure Data Scientist Associate DAMAG Founder, ODSC Ambassador AI & ML Practice Director @ Email: lzavarella@lucient.com Twitter: @lucazav LinkedIn: http://it.linkedin.com/in/lucazavarella Blog: medium.com/@lucazav
  • 3. Agenda • ML Prototyping • Machine Learning Process • Azure AutoML • Overview • Validation Types • Algorithms • Featurization • Data Guardrails • Demo • Conclusion
  • 5. Components of a BI Project ONE WAY
  • 6. Steps To Build a Machine Learning Solution 1 Problem Framing 2 Get/Prepare Data 3 Develop Model 4 Deploy Model 5 Evaluate / Track Performance 3.1 Analysis/ Metric definition 3.2 Feature Engineering 3.3 Model Training 3.4 Parameter Tuning 3.5 Evaluation
  • 7. What a Machine Learning Project Really Is A Machine Learning project can be viewed as… …a research and development activity …later transformed into a Data Engineering project. Icons by Vectors Market from the Noun Project In short, a complete ML project could take months to implement!
  • 8. Quoting a ML Project Data Exploration You… Feature Selection Customer Could you quote for a ML Proof Of Concept?
  • 9. The Suggested POC Process You Customer STEP 1 1. Define target 2. Understand what data is available 3. Define the schema of the input ML dataset 2-3 days STEP 2 1. Collect all the data 2. Clean the data 3. Provide data with the defined schema X days STEP 3 1. All the ML magic stuff! 2-3 days STEP 4 1. Documentation 2. Presentation of results 1-2 days Fast tool for prototyping
  • 16. Validation Types: KFold vs MonteCarlo
  • 17. Validation Types: Auto • For datasets larger than 20,000 rows, the 10% of the initial training data is taken as the validation set. In turn, that validation set is used for metrics calculation. • For datasets smaller than 20,000 rows, the cross-validation approach is applied: 10 folds will be used if the dataset is less than 1000 rows; 3 folds otherwise.
  • 18. Validation Types: Rolling Origin Cross-Validation Only for time-series forecasting
  • 22. Ensembling Models: Voting Hard Voting. Predict the class with the largest sum of votes from models Soft Voting. Predict the class with the largest summed probability from models. Classification The prediction that is the average of the prediction of base regressors Regression
  • 25. Featurization: Scaling and Normalization
  • 28. Data Guardrails in the UI Highly imbalanced: ratio of the samples in the least populated class to the samples in the most populated class is less than 20%
  • 29. Data Guardrails Using Python SDK
  • 32. Azure AutoML Strengths • A Python-based technology • Easy integration with your custom pipelines • Data normalizations and basic transformations included • Complex featurization included • Also NLP feature engineering • Ensemble models out of the box • Automatic model explanations
  • 33. Future of Azure AutoML • Only basic imputers implemented • Stratified cross-validation not implemented out of the box • Highly imbalanced datasets not automatically fixed • Explicit feature selection step missing • Neural Networks still not included in training algorithms • Except for ForecastTCN for time-series forecasting
  • 34. References • Probabilistic Matrix Factorization for Automated Machine Learning (https://arxiv.org/abs/1705.05355) • What is automated machine learning (AutoML)? (https://docs.microsoft.com/en-us/azure/machine-learning/concept- automated-ml) • A Review of Azure Automated Machine Learning (AutoML) (https://medium.com/microsoftazure/a-review-of-azure-automated- machine-learning-automl-5d2f98512406 )