1. AGENDA
• About me
• Predictive Analytics
• Amazon Machine Learning (ML)
• Amazon ML – Key Concepts
• Amazon ML – Datasources
• Amazon ML – Models
• Amazon ML – Evaluations
• Amazon ML – Demo
AN INTRO TO AWS MACHINE
LEARNING
PREDICTIVE ANALYTICS
2. ABOUT ME
NVISIA® Confidential 20162
Naveen VK
• Principal Architect at NVISIA, a regional software development company
• Worked for NVISIA for over 17 years
• Designed and built custom multi-tier applications using Java Enterprise stack for various companies
• Involved in entire application development lifecycle including requirements gathering, architecture, design,
implementation, integration, testing and deployment
• Some clients: ETF - State of WI, American Family, Harley Davidson, Cumulus Media
• Currently working at ETF (Employee Trust Fund)
• Manage pensions, insurance and other benefits for state and local employees
• Involved in multiple projects (5) and currently supporting multiple applications (7)
• Has deep expertise in databases like Oracle (since 1994) and DB2 (since 1999) and with SQL queries and
PL/SQL stored procedures
• 3 fun facts about myself
4. PREDICTIVE ANALYTICS
NVISIA® Confidential 20164
What is it?
• Mining data, using statistical algorithms and machine learning to predict trends or probabilities
• Use historical data and patterns in historical data to predict future
• Create models based on patterns in data to predict the probability of something happening in the future
• The better the model and the training data, the better the prediction
Examples
• Is this email spam?
• Will this product sell?
• How many units of this product will sell?
• Is this product a piece of clothing, a book or a movie?
• What price will this house sell for?
• What will be the temperature here tomorrow?
6. AMAZON MACHINE LEARNING (ML)
NVISIA® Confidential 20166
• AWS (Amazon Web Service) cloud-based service for predictive analytics
• Use tools and wizards to create machine learning models
• Use simple APIs to obtain predictions for your application
• No need to write custom code or have supporting infrastructure
• Finds patterns in your existing data
• Use models to process new data and generate predictions
When to use ML?
• ML is not a solution for every type of problem
• A target value can be determined by coding simple rules, computations and steps without any data-driven
learning
• Use ML when the rules cannot be programmed easily
• Too many factors
• Too many overlapping rules
• Too much fine tuning of rules
• Use ML when the solution cannot be scaled
• 100s of Millions vs. 100s (Example: manual vs. automated spam filter)
7. AMAZON ML – KEY CONCEPTS
Terms and concepts
NVISIA® Confidential 20167
8. AMAZON ML – KEY CONCEPTS
NVISIA® Confidential 20168
Datasources
• Contains metadata associated with data inputs to the ML
• Speadsheets, CSV files, Streaming data, Relational data base
ML Models
• Patterns in data to generate predictions
Evaluations
• Measure the quality of ML models
Batch Predictions
• Multiple data inputs aka batch data
• Asynchronous
Realtime Predictions
• Individual data inputs
• Synchronous
9. AMAZON ML – DATASOURCES
Details of datasources in Amazon ML
NVISIA® Confidential 20169
10. AMAZON ML – DATASOURCES
NVISIA® Confidential 201610
• In Amazon ML, a datasource contains only the metadata about the actual input data
• Actual data may be stored in
• Amazon S3 buckets
• Amazon Redshift Databases
• MySQL databases in Amazon Relational Database Service (RDS)
• Amazon Kinesis
• Attributes
• Column headings represent attributes
• Unique
• Required
• Target Attribute
• The data that is being predicted
• Training data has a target attribute that has already been predicted (required in training data)
• Observation
• Single row of data
• Input data
• All observations aka Rows in spreadsheet/csv file or database
11. AMAZON ML – DATASOURCES CONTINUED
NVISIA® Confidential 201611
• Schema
• All attributes and corresponding data-types of input data
• Location
• Location of input data stored in, say, Amazon S3 bucket
• Row ID
• Attribute flagged to be included in prediction output
• Helps cross-reference the prediction with the observation
• Unique for each observation
• Optional
• Datasource Name
• Human readable name of the datasource
• Optional
• Statistics
• Summary stats for each attribute of input data
• Status
• All attributes and corresponding data-types of input data
12. AMAZON ML – MODEL
Details of mathematical model in Amazon ML
NVISIA® Confidential 201612
13. AMAZON ML – MODEL
NVISIA® Confidential 201613
• In Amazon ML, a model finds patterns in data and generates predictions
• Three distinct types of models
• Binary
• Multiclass
• Regression
• Type of model chosen based on the type of target to predict
• Binary Model
• Predicts values that has 1 of 2 states: true/false, 1/0, win/lose, alive/dead, pass/fail, healthy/sick
• Uses industry-wide standard learning algorithm called Binary Logistic Regression Algorithm
• Statistical model used to predict the probability of a binary response based on certain variables
• Examples
• Is this email spam?
• Will this product sell?
• Multiclass Model
• Predicts values that belong to a pre-defined, limited set of states (1 of 3 or more states)
• Uses industry-wide standard learning algorithm called Multinomial Logistic Regression Algorithm
• Examples
• Is this product a book, a movie or apparel?
• Is this movie a thriller, a documentary or a comedy?
14. AMAZON ML – MODEL
NVISIA® Confidential 201614
• Regression Model
• Predicts a numeric value
• For regression problems
• Uses industry-wide standard learning algorithm called Linear Regression Algorithm
• Statistical model to predict the value of y based on a number of variables x1, x2, x3, etc.
• Examples:
• What will the temperature be tomorrow?
• How many units of this product will sell?
• How much will this house sell for?
• Recipe
• Attributes and attribute transformations available to train the model
• Model size
• In MB
• Directly proportional to patterns stored in model
• Number of passes
• The number of times the datasource is used when training the model
• Regularization
• ML technique to get higher quality models
15. AMAZON ML – EVALUATIONS
Evaluate the model in Amazon ML
NVISIA® Confidential 201615
16. AMAZON ML – EVALUATIONS
NVISIA® Confidential 201616
• In Amazon ML, an evaluation measures the quality of the ML model
• Need to evaluate a model to determine if it will do a good job predicting the target on new/future data
• Need training data where target is already predicted to train/evaluate a model
• Max size of training data: 100KB
• Model Insight
• Amazon ML will provide metrics and insights to review accuracy of the model
• Overall success metric of the model
• Visualizations to explore accuracy of model
• Alerts to check validity of evaluation
• Focus on Binary Insights only for this presentation
17. AMAZON ML – EVALUATIONS – BINARY INSIGHTS
NVISIA® Confidential 201617
• Prediction score
• Actual output of the binary prediction
• Indicates the system’s certainty that the given observation has target value of 1
• Output scores of observations is between 0 & 1
• Default threshold score aka cut-off is 0.5, this can be changed
• Any observation that scores above cut-off is predicted as target=1 and below cut-off is predicted as 0
• Correct predictions
• True Positive (TP)
• Predicted value of target = 1, true value of target = 1
• True Negative (TN)
• Predicted value of target = 0, true value of target = 0
• Incorrect predictions
• False Positive (FP)
• Predicted value of target = 1, true value of target = 0
• False Negative (FN)
• Predicted value of target = 0, true value of target = 1
• Area Under the Curve (AUC)
• Measures the ability of the model to make a correct prediction
• AUC near 1 indicates model is highly accurate (near 0s?)
19. AMAZON ML – DEMO – BINARY MODEL
NVISIA® Confidential 201619
• Demo
• Simple – predicting will this product sell?
• Not so simple – predicting will this person survive?
• Checklist
• Predictive Analytics
• Amazon Machine Learning (ML)
• Amazon ML – Key Concepts
• Amazon ML – Datasources
• Amazon ML – Models
• Amazon ML – Evaluations
• Amazon ML – Demo
• Pricing
• https://aws.amazon.com/machine-learning/pricing/
• Data analysis and model building: @0.42/hr
• Batch predictions: $0.10/nearest 1000 (rounded up to the next 1000)
• Realtime predictions: $0.0001/transaction (rounded to nearest penny)
• S3 Standard storage: $0.03/TB/month
• Questions
20. THANK YOU FOR COMING
Links:
http://docs.aws.amazon.com/machine-learning/latest/dg/what-is-amazon-machine-learning.html
https://www.kaggle.com/
Contact Info:
Linked-In: Naveen VK
Email: naveen@nvisia.com (work)
naveenvkm@gmail.com (personal)
Github: https://github.com/navnoon23/