8. 8
ML 101: What is Machine Learning?
What: “Field of study that gives computers the ability to learn
without being explicitly programmed” – A. Samuel, 1959
How: Generalizing (learning) from examples (data)
Simple ML workflow:
– EXPLORE data
– FIT models based on data
– APPLY models in production
– VALIDATE models
– REPEAT
9. 9
Learning From Data
[Prediction]
• When we see thick clouds and an overcast sky, we
predict that it’s likely going to rain
[Estimation/ Regression]
• Estimate how much an apartment costs based on its
location, condition and prices of properties in that
neighborhood
[Classification/ Clustering]
• Determine the gender of a person based on her/his
features, hair style and the way s/he dresses
[Anomaly Detection]
• Identify the odd one out
[Reinforcement Learning]
• If I made a mistake this time, can I do better next time?
All of us have had some
experience in learning. But…
what’s behind our experience?
How do we translate that
knowledge to code?
10. 10
Major Types of Machine Learning
1. Supervised Learning: generalizing from labeled data
11. 11
Major Types of Machine Learning
2. Unsupervised Learning: generalizing from unlabeled data ?
12. 12
3. Reinforcement Learning:
• System is rewarded (or punished) based on the outcomes it generates
• Action leads to a change in the state of the world and generates an error score
Major Types of Machine Learning
34. 34
What Else?
• Get the Machine Learning Toolkit from Splunkbase
• Go watch Machine Learning Videos on Splunk Youtube Channel
http://tiny.cc/splunkmlvideos
• Go watch the Machine Learnings talks from Conf 2016:
– Advanced Machine Learning in SPL with the Machine Learning Toolkit by Jacob
Leverich
– Extending SPL with Custom Search Commands and the Splunk SDK for Python by
Jacob Leverich
• Early Adopter And Customer Advisory Program : mlprogram@splunk.com
• Field ML Architects : Andrew Stein (astein@), Brian Nash (bnash@)
36. 36
What’s New since our 0.9 Beta Release (last year’s .conf)?
3
• New name and abbreviation ;-)
• No event limits (removal of 50K
limit on fitting models)
• Configurable resource caps via
mlspl.conf
• Search head clustering support
• Distributed / streaming apply
• Scheduled fit
• New algorithms (next slide)
– Feature engineering and selection
– Stochastic gradient descent (e.g.)
– ARIMA
• Multi-algorithm support across
Assistants
• Scatterplot matrix viz
• Alerting
• Tooltips
• In-app tours
• Cluster Numeric Events assistant
• Videos videos videos for each
assistant across IT, Security, IoT and
Business Analytics
• ML-SPL Cheat Sheet