WhizzML is a new domain-specific language for automating Machine Learning workflows, implement high-level Machine Learning algorithms, and easily share them with others. WhizzML offers out-of-the-box scalability, abstracts away the complexity of underlying infrastructure, and helps analysts, developers, and scientists reduce the burden of repetitive and time-consuming analytics tasks.
2. BigML, Inc 2
Spring 2016 Release
POUL PETERSEN (CIO)
Enter questions into chat box – we’ll
answer some via chat; others at the end of
the session
https://bigml.com/releases
ATAKAN CETINSOY, (VP Predictive Applications)
Resources
Moderator
Speaker
Contact info@bigml.com
Twitter @bigmlcom
Questions
@whizzml
4. BigML, Inc 4
Promise of ML
time
Want
•Reduce churn
•Increase conversion
•Improve diagnosis
•Reduce fraud
•Etc.
Automated InsightsData
Have
5. BigML, Inc 5
ML Hurdles
time
•Which algorithms?
•How to scale it?
•How to handle real data?
•How to tune it?
•How to automate it?
6. BigML, Inc 6
Current Resources
SOURCE DATASET CORRELATION
STATISTICAL
TEST
MODEL ENSEMBLE
LOGISTIC
REGRESSION EVALUATION
ANOMALY
DETECTOR
ASSOCIATION
DISCOVERY
PREDICTION
BATCH
PREDICTIONSCRIPT LIBRARY EXECUTION
Data
Exploration
Supervised
Learning
Unsupervised
Learning
Automation
CLUSTER
Scoring
7. BigML, Inc 7
BigML Vision
time
Automation
Paving the Path to Automatic Machine Learning
REST
API
Programmable
Infrastructure
A
Sauron
• Automatic
deployment
and
auto-‐scaling
Data
Generation
and
Filtering
C
Flatline
• DSL
for
transformation
and
new
field
generation
B
Wintermute
• Distributed
Machine
Learning
Framework
2011 Spring 2016
Automatic
Model
Selection
E
SMACdown
• Automatic
parameter
optimization
Workflow
Automation
D
WhizzML
• DSL
for
programmable
workflows
8. BigML, Inc 8
Workflow Map
Decision
Trees
Bagging
Decision
Forest
LogisGc
Regression
MODEL
DATASET
CLUSTER ANOMALY
ASSOCIATION
SOURCE
K-‐Means
G-‐Means
IsolaGon
Forest
Magnum
Opus
StaGsGcal
Tests
CorrelaGons
STATSDATASET
Flatline
Flatline
Editor
PREDICTION
Batch
PredicGon
Batch
Anomaly
Batch
Centroid
EvaluaGon
10. BigML, Inc 10
Regular Workflows
MODEL
FILTERSOLD HOMES
BATCH
PREDICTION
NEW FEATURES
DATASET DEALS
DATASET
FILTERFORSALE HOMES NEW FEATURES
11. BigML, Inc 11
Model Selection
ENSEMBLE LOGISTIC
REGRESSION
EVALUATION
SOURCE DATASET
TRAINING
TEST
MODEL
EVALUATIONEVALUATION
CHOOSE
12. BigML, Inc 12
Model Tuning
ENSEMBLE
N=20
EVALUATION
SOURCE DATASET
TRAINING
TEST
EVALUATIONEVALUATION
ENSEMBLE
N=10
ENSEMBLE
N=1000
CHOOSE
13. BigML, Inc 13
SMACdown
•How many models?
•How many nodes?
•Missing splits or not?
•Number of random candidates?
•Balance the objective?
SMACdown can tell you!
14. BigML, Inc 14
Best-First Features
{F1}
CHOOSE BEST
S = {Fa}
{F2} {F3} {F4} Fn
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST
S = {Fa, Fb}
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST
S = {Fa, Fb, Fc}
17. BigML, Inc 17
Why Workflows
•Machine Learning is iterative by nature.
•ML tools still require many repetitive (and manual)
tasks.
•Instead of helping to focus on the output many
tools force analysts, developers, and scientists to
focus on infrastructure, parallelism, etc.
•Not everybody can implement complex workflows
or meta-algorithms but many people can reuse
them.
18. BigML, Inc 18
WhizzML Features
•A Domain-Specific Language (DSL) for
automating Machine Learning workflows.
•Complete programming language.
•Machine Learning “operations” are first-class
citizens.
•Scale is provided for free.
•API First! - Everything is composable.
41. BigML, Inc 41
Conclusion
•Automation is critical to fulfilling the promise of ML
•WhizzML can create workflows that:
•Automate repetitive tasks.
•Automate model tuning and feature
selection.
•Combine ML models into more powerful
algorithms.
•Create shareable and re-usable executions.