Microsoft Azure ML Studio provides an easy-to-use interface to build and deploy machine learning models. However, the user must carefully select and configure the modules in order to derive meaningful results. In this presentation, I discuss a case study to highlight best practices in building machine learning models.
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Best practices in building machine learning models in Azure ML
1. Microsoft Global AI Bootcamp
Best practices in building machine
learning models in Azure ML
Zeydy Ortiz, Ph. D.
zortiz @ datacrunchlab.com
www.linkedin.com/in/zortiz
@DrZeydy @DataCrunch_Lab
3. DataCrunch
Lab
Founded in 2016, ICMM is a nonprofit research-driven
agency based in Raleigh, NC
Mission: To create a sustainable financial future for consumers
CEO: Dr. Diane Chen
Research Fellow: Patrick Royal
Research Project: Create a Machine Learning system to
help credit counseling agencies (CCA) retain consumers
enrolled in debt management plans (DMP)
#GlobalAIBootcamp
@DataCrunch_Lab
4. DataCrunch
Lab
Agenda
• AI & ML, what’s the
relationship?
• About Azure ML
• ML Case Study (with
examples from Gallery)
#GlobalAIBootcamp
@DataCrunch_Lab
5. DataCrunch
Lab
AI & ML, what’s the relationship?
Source: https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
#GlobalAIBootcamp
@DataCrunch_Lab
6. DataCrunch
Lab
ML/AI is currently being used in many
sectors and business functions
Retail Healthcare Financial Industrial
Education Pharmaceutical Real Estate Transportation
Advertising Manufacturing Legal Utilities
Marketing Sales
Customer
Experience
Human Resources
#GlobalAIBootcamp
@DataCrunch_Lab
7. DataCrunch
Lab
Use cases of ML/AI in business
Search
Sales lead scoring
Demand forecasting
Predictive maintenance
Fraud detection & prevention
Advertisement placement
Capacity planning
Dynamic pricing
Route planning
Increased revenue
Increased efficiency
Reduced cost
Increased customer
satisfaction
#GlobalAIBootcamp
@DataCrunch_Lab
11. Debt Management Plans (DMP)
“A debt management plan sets up a
payment schedule for you to repay your
debts, with the goal of helping
creditors receive the money owed to
them and ultimately improving your
financial and credit standing.”
“It usually takes 3-5 years to complete
payments under a debt management
program, after which you may be able
to reestablish credit.”
From National Foundation for Credit
Counseling – www.nfcc.org
Photo by Francisco T Santos on Unsplash
#GlobalAIBootcamp
@DataCrunch_Lab
12. Why use Machine Learning?
Historical data is already been used by credit counseling agencies.
However, currently not able to provide personalized service.
Photo by chuttersnap on Unsplash
#GlobalAIBootcamp
@DataCrunch_Lab
13. DataCrunch
Lab
What problem are we solving?
Organization’s Challenge
Improve customer retention
in DMP program
ML Problem
Clustering
Classification
Regression
Recommender System
First step: Identify how long a new consumer is expected to
stay in DMP program
#GlobalAIBootcamp
@DataCrunch_Lab
15. DataCrunch
Lab
Data is messy
Errors in data entry
Calculation errors
Outliers
Many categories
Sex: F, W, female, california
Age: -1, 104
Debt: $1,765,234
Referral: Yahoo, Web, Organic
Consult with subject matter expert to incorporate context
and determine what is reasonable
#GlobalAIBootcamp
@DataCrunch_Lab
16. DataCrunch
Lab
Cleaning Data
Checklist
Fields not known at
enrollment time
Missing values
Fields with many zeros
Fields with near zero
variance
Highly correlated fields
Outliers
Categorical fields with many
different values
Data Leakage
Identify and determine how to
treat these fields or values
- Ignore
- Substitute
- Remove
- Transform
- Consolidate
#GlobalAIBootcamp
@DataCrunch_Lab
17. DataCrunch
Lab
Incorporating best practices in ML
7
12
32
0 5 10 15 20 25 30 35
BEST ALGORITHM
PROCESSED DATA
RAW DATA
Mean Absolute Error
#GlobalAIBootcamp
@DataCrunch_Lab
18. DataCrunch
Lab
“
”
The No Free Lunch (NFL)
theorem states that there is no
[machine learning] model that
works best for every problem.
- Eric Cai
Based on work by David H. Wolpert “The Lack of A Priori Distinctions between
Learning Algorithms”, 1996
#GlobalAIBootcamp
@DataCrunch_Lab
21. DataCrunch
Lab
Understand the assumptions behind the
algorithms
Linear regression
Predict numeric target
House sales price
Energy use
Taxi fare
Poisson regression
Predict count data
# calls received in a call center
# patients arriving in ER
# months in program
#GlobalAIBootcamp
@DataCrunch_Lab
22. DataCrunch
Lab
Assessing performance of algorithms
Azure ML Studio provides modules to
Split Data
Partition and Sample
Cross Validate Model
Tune Model Hyperparameters
#GlobalAIBootcamp
@DataCrunch_Lab
This is where Azure AutoML can help
23. DataCrunch
Lab
Which model is best for this data set?
Use test data set to assess performance
#GlobalAIBootcamp
@DataCrunch_Lab
24. DataCrunch
Lab
What is the model using to make predictions?
Does it make sense?
Should we use these fields?
#GlobalAIBootcamp
@DataCrunch_Lab
25. “Start with the end in mind”
Deploying the algorithm requires
coordination with the organization
Options: Web service (API), Batch, Local
Photo by Matt Lamers on Unsplash
#GlobalAIBootcamp
@DataCrunch_Lab
26. DataCrunch
Lab
Key takeaways
• Follow industry best
practices
• The ML problem is not the
organization’s problem
• Yes, clean your data
• Compare multiple algorithms
• Be skeptical of your models
• Consider your options for
deployment
#GlobalAIBootcamp
@DataCrunch_Lab
27. DataCrunch
Lab
Team capabilities
• Data science consulting
• Custom software development
• Machine Learning, Artificial
Intelligence, and Cognitive
technologies
• Big data & IoT Solutions
Innovation Awards
Grand Prize Winner
Highest Potential Value
to Manufacturers
#GlobalAIBootcamp
@DataCrunch_Lab