AutoML is an approach to automating machine learning that aims to:
- Generate new machine learning pipelines
- Find good model pipelines by capturing and representing knowledge to "expertly" choose pipelines
- Use reinforcement learning with human input in real-time to guide iteration between pipelines to match how experts iterate
The key advantages of this AutoML approach are that it more closely mirrors how experts work by incorporating reasoning, knowledge representation, and human feedback into the process. This allows the algorithm to provide interpretable results and focus future trials based on learned performance, targeting faster results than traditional AutoML methods.
IAC 2024 - IA Fast Track to Search Focused AI Solutions
AutoML for Expert Data Science
1. AutoML
Productivity for Data Science, AND …
A better way to make Digital Decisions
Dr. Steven Gustafson
Chief Scientist, Maana (2+ years)
previously, GE Research (10+ years)
(before previously, PhD AI for “automatic programming”)
2. What do you take-away?
Observe my arguments about AutoML and the algorithm
Reason about the evidence, consider past experience
Decide to change your Data Science approach
Learn by experimentation and feedback
3. My Argument
• Generate new knowledge
• Find good model pipelines
• Allow your experts and data scientists to
understand, learn and improve models that drive
business decisions!
• We created our AutoML as an archetype for
architecting digital decisions!
4. AutoML
• Generate and tune ML pipeline
• Auto-WEKA, Auto-SKLEARN, Google NN, Azure ML, …,TPOT
• Most Bayesian learning or computation vs. improvement
• Black box – helps find solutions, not knowledge or wisdom
• Assumes future problems represented by data
• Biased by what code and data is available vs. what’s useful
• Can be very long running - hours to days
5. Expert Data Science
Observe the Problem, Data, Background Knowledge
Reason about data characteristics vs. goals vs. techniques
Decide on initial approaches
Learn from results and iterate
6. Expert Data Science Vs. AutoML
Massive compute
Optimize many, many
parameters
Blind search, etc?
How do you explain results?
Justify compute budget
Engage an SME?
Does the Data Scientist learn?
Observe the Problem, Data, Background
Knowledge
Reason about data characteristics
Decide on initial approaches
Learn from results and iterate
=?
7. What if AutoML…
• Capture & represent knowledge
• Use reasoning to ”expertly” choose pipelines
• Use reinforcement learning with human input in real-
time to guide iteration
• Target seconds and minutes for results instead of hours
and days, match expert iteration
8. What is Knowledge Representation?
• A surrogate, a substitute for the thing itself.
• Enable an entity to determine consequences by thinking rather than
acting.
• A “language” in which we say things about the world.
• A “theory” of intelligent reasoning: the type of reasoning and the
applicable reasoning given data
• Guidance for organizing information to facilitate inferences to get
new expressions from old.
• A KR is not a data structure. A KR must be implemented in the
machine by some data structure.
http://groups.csail.mit.edu/medg/ftp/psz/k-rep.html
9. Program Search for Machine Learning Pipelines Leveraging
Symbolic Planning and Reinforcement Learning
F. Yang, S. Gustafson, A. Elkholy, D. Lyu, B. Liu. Program Search for Machine
Learning Pipelines Leveraging Symbolic Planning and Reinforcement Learning. In
Genetic Programming Theory and Practice XVI. 2018. Springer.
10. Symbolic planning
• Symbolic planning concerns using logical formalism to represent dynamic
systems and performs automated algorithms that generate plans
• Plans are a sequence of actions that achieves the goal state from an
initial state
• Common action description language (such as B, C, C+, BC) where plan
can be automatically computed using ASP solver, such as Clingo.
Data science contains a set of actions that transform and fit Data.
12. Pipelines• Featurizers
– Count / bag of words Vectorizor
– Tfidf Vectorizer
• Preprocessors
– matrix decompositions (truncatedSVD,pca,kernelPCA,fastICA)
– kernel approximation (rbfsampler,nystroem)
– feature selection (selectkbest,selectpercentile)
– scaling (minmaxscaler,robustscaler,absscaler)
– no preprocessing
• Classifiers
– logistic regression
– gaussian naive Bayes
– linear SVM
– random forest
– multinomial naive Bayes
– stochastic gradient descent
Nystroem: Approximate a kernel map using a
subset of the training data.
KernelPCA: Kernel Principal component analysis
(KPCA)
fastICA: a fast algorithm for Independent
Component Analysis.
truncatedSVD : This transformer performs
linear dimensionality reduction by means of
truncated singular value decomposition (SVD).
Rbfsampler : Approximates feature map of an
RBF kernel by Monte Carlo approximation of its
Fourier transform.
13. Reinforcement Learning
• Find a policy, i.e., a mapping from
state to action, such that the
agent can accumulate maximal
reward
• Learns the policy by trial-and-
error: executing actions in the
environment, obtain reward,
update its estimation of the value
function, until the value iteration
converges
• R-learning, update R(s,a) and
rho(s) that reflects long term
undiscounted average reward
and gain reward, shooting for
finite horizon problems (fixed
number of steps in future)
Data scientists performs trail and error on
different ML pipelines to understand the most
effective pipeline and hyper-parameters, similar
to performing a reinforcement learning process
14. PEORL: Planning--Execution--Observation--Reinforcement-Learning
Define pipeline goals
Find all satisfying plans
Shortest/highest
reward plan
instantiated
Update plan R-values
Planner focuses future trials on plan
components and overall pipelines with
higher learned rewards until all plans are
tried, accuracy achieved, or out of time.
BC Action Language
ASP - Clingo
scikit-learn
R-Learning
15.
16. Evidence from Experiments
Best IMDB 300
bag of words, fastICA and stochastic gradient descent (SGD),
Hashing vectorizer: ngram range = (1,2), lowercase = False •
FastICA: n components = 3 • SGD classifier: loss=log,
penalty=l2
Best Polarity Dataset 2.0 2000 movie reviews
Cross validation accuracy of 0.84
• Hasing vecctorizer: ngram range = (1,3), lowercase = True
• FastICA: n components = 3
• SGD classifier: loss = modified huber, penalty=elasticnet.
Best Full IMDB dataset
Cross validation score of 0.88
• Hashing vectorizer: ngram range = (1,1), lowercase = False
• FastICA: n components = 3
• SGD classifier: loss=log, penalty=None
300 IMDB Docs – Top 5
300 IMBD Docs – Bottom 5
18. Rho value evolution
Pipeline A,B,C:
• A – B fixed
• C changes
* Episodes are sequential, not reflected below
Each pipeline is evaluated for 1..5 episodes of 5-fold
cross-validation, 300 documents, 2 classes. Each
episode updates the value
𝜌
episode
22. References
F. Yang, A. Elkholy, S. Gustafson. Interpretable Automated Machine Learning
in Maana Knowledge Platform. 18th International Conference on
Autonomous Agents and Multiagent Systems (AAMAS), Montreal. Extended
Abstract, May, 2019.
D. Lyu, F. Yang, B. Liu, S. Gustafson. SDRL: Interpretable and Data-efficient
Deep Reinforcement Learning Leveraging Symbolic Planning. 33rd AAAI
Conference on Artificial Intelligence (AAAI), Honolulu, HI, 2019
F. Yang, S. Gustafson, A. Elkholy, D. Lyu, B. Liu. Program Search for Machine
Learning Pipelines Leveraging Symbolic Planning and Reinforcement
Learning. In Genetic Programming Theory and Practice XVI. 2018. Springer.
F. Yang, D. Lyu, B. Liu, S. Gustafson. PEORL: Integrating Symbolic Planning
and Hierarchical Reinforcement Learning for Robust Decision-Making. IJCAI.
Sweden. 2018.
23. AutoML
• Algorithm closely mirrors expert’s process, reasonable results
• Algorithm is naturally “human in the loop”
• Includes learning, via human input and reinforcement learning
• Anything else?
24. Digitization / Digital Decisions
AutoML has a knowledge representation of a digital decision
It allows you to think & reason about the decision before
making it
I have made AutoML before, but this time, I want to do it in a
way that aligns with digital decisions in general.
AutoML is simply a digital decision for picking a ML pipeline!
25. Canvas (derived from “to canvass”)
• A set of topics and questions that allowed you to
gather information about your business and
strategy, reflect, brainstorm and refine strategy
• We will use a four section Decision Canvas:
1. Define the problem or opportunity
2. Identify the decision strategy
3. Break down the decision
4. Define the solution as composable functions
26.
27. Given data with labels, what is the best model to predict label of new data?
28. Data, methods
that can be
combined into
a pipeline
Pipeline with
good cross
validation
accuracy
Shortest
pipelines with
low variability
in accuracy
Iterate
over
different
pipelines
Given data with labels, what is the best model to predict label of new data?
29. What pipeline steps
have worked well,
gotten closer to goal
(better CV results)
Stop
pipeline, set
accuracy,
constrain
options
Select next
pipeline to
try
Pipeline
meets goal,
best so far
Labeled
data, user
preferences
on pipeline
CV results CV results,
user action
to stop
Data, methods
that can be
combined into
a pipeline
Pipeline with
good cross
validation
accuracy
Shortest
pipelines with
low variability
in accuracy
Iterate
over
different
pipelines
Given data with labels, what is the best model to predict label of new data?
30. model = best ( ... ( learn ( score ( plan
( input data, user preferences) ) ) ) )
where ... is an iteration of (learn(score(plan( )))
until all plans are tried or a target accuracy is
met
model : given input data and user preferences,
what is the best pipeline
plan : given input data and user preferences and
known pipeline element performance, what are
ordered by potential performance and length
the possible pipelines
score : given potential pipeline, what is its
accuracy
learn : given pipeline performance, what is
pipeline element performance
best : given known pipeline accuracy, what is
the best one
What pipeline steps
have worked well,
gotten closer to goal
(better CV results)
Stop
pipeline, set
accuracy,
constrain
options
Select next
pipeline to
try
Pipeline
meets goal,
best so far
Labeled
data, user
preferences
on pipeline
CV results CV results,
user action
to stop
Data, methods
that can be
combined into
a pipeline
Pipeline with
good cross
validation
accuracy
Shortest
pipelines with
low variability
in accuracy
Iterate
over
different
pipelines
Given data with labels, what is the best model to predict label of new data?
31. Example Digitization : Should I bring my
umbrella?
• Traditionally, I would only observe the weather report, but I can now
combine this with my online calendar to decide if I’ll be outside
• It stands to reason that I should bring an umbrella if I’ll be outside long
when it is most likely to rain
• If I have an important meeting, a long distance to walk, or if I have to
carry a lot of other things, will factor into a decision about bringing an
umbrella.
• I want to learn to predict what to bring better, a better estimate of
walking times, and learn to manage my daily activities better in
general.
• Optimizing a decision (bring umbrella) extends previous data (weather
report) and fills in missing data (walking times), useful for other
opportunities.
32. Given today’s activities, should I bring
my umbrella?
(main PQ)
Given activities and step monitor
data, when can I assume I am outside?
(predict time outside based on step
data)
Given time outside and the weather
forecast, what is likelihood of getting
wet?
(combine outside and weather
prediction)
Given likelihood of getting wet and
activities, when do I accept
recommendation to bring umbrella?
(learn judgement decision to bring
umbrella (Y/N) as conditioned on wet
likelihood and activities)
Today’s
activities,
Weather
predictions
Am I
outside
when it’s
raining?
Will being
wet matter?
Cost of
carrying it?
Don’t get
caught
out in the
rain
What activities,
when will I be
outside, based
on steps data
Carry
umbrella
given day’s
activities?
Bring
umbrella?
Happy with
advice to
bring
umbrella –
sent by text
Day’s
activities
(locations
and times),
weather
service
Activities
(name and
time) and
activity step
monitor data
Reply to text
is Yes, No. A
No is used to
train a
function on
decision to
send text.
Given today’s activities, should I bring my umbrella?
33. What do you take-away?
Observe my arguments about AutoML and the algorithm
Reason about the evidence, consider past experience
Decide to change your Data Science approach
Learn by experimentation and feedback
34. Team
Fangkai Yang (NVIDIA) Prof. Bo Liu (Auburn) Daoming Lyu (Auburn) Alexander Elkholy Krishnan Ram (intern)
Jeremy Brown Sergey Ilinskiy