8. AWS MACHINE LEARNING
WHAT IT DOES
▸ Supervised Predictive Analytics
▸ On structured data and text,
▸ Outcome as function of variables,
ground truth known on subset
WHAT IT DOES NOT
▸ Unsupervised learning
▸ Reinforcement learning
▸ Deep learning
9. WORK FLOW - AWS ML PROJECT
Create a datasource from S3, RDS or Redshift
Transform the data with recipes (opt)
Train a Model
Evaluate
Create endpoints
3
1
2
4
5
10. DATASOURCE
AWS extracts the schema
AWS analyses the data
Provides simple visualization
Offers default transformation
S3
Redshift
RDS (CLI - SDK only)
3
1
2
4
13. SCHEMA, RECIPES AND FEATURES
▸ From the data, AWS suggests the optimal transformations - recipe
▸ 7 transformations are available
▸ Text: N-gram, Orthogonal Sparse Bigram, Lowercase, Punctuation
▸ TF-IDF by default, no stop words, no POS, Lemma, …
▸ Categorical: Cartesian product
▸ Numeric: Normalization, Quantile Binning
▸ QB: non linearities in continuous, numeric to categorical
▸ Recipe is downloadable
25. STRONG POINTS
▸ Powerful modeling: SGD + quantile binning
▸ AWS ecosystem
▸ Multiple sources (S3, RDS, Redshift)
▸ Simple to setup and use
▸ Great for benchmarking
▸ No need for production code!
▸ CLI - SDKs (python, …)
26. ROOM FOR IMPROVEMENTS
‣ No cross validation!
‣ Can’t export your trained models*
‣ No scripting
(*) Stealing Machine Learning Models via prediction APIs
http://www.cs.unc.edu/~reiter/papers/2016/USENIX.pdf
▸ Limited data visualization
▸ Limited feature engineering
▸ SGD model only: no forests, SVMs, Bayes, …
▸ No deep learning (EC2)
27. STILL A NEED FOR DOMAIN EXPERTISE
AND FEATURE ENGINEERING
GREAT TIME SAVER BUT