Gary Hope is currently the Data Platform Technical Specialist within Microsoft South Africa having previously worked for several large organisations including American Express and Siemens Business Solutions.
Slides from talks presented at Mammoth BI in Cape Town on 17 November 2014.
Visit www.mammothbi.co.za for details on the event. Follow @MammothBI on twitter.
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
Gary Hope - Machine Learning: It's Not as Hard as you Think
1. THE WORLD WE LIVE IN
Speaker 4 of 17
Gary Hope
@GaryHope
Machine Learning – It’s Not as Hard as You Think
Followed by
Gillian Staniland
2. What is Machine Learning?
Data and Decisions
Data Science Workflow for
Machine Learning
Data Science Workflow for
Machine Learning
3. What is Machine Learning?
Delivering on one of the old dreams
of Microsoft co-founder Bill Gates:
Computers that can see, hear
and understand.
John Platt
Distinguished scientist at
Microsoft Research
A breakthrough in machine
learning would be worth ten
Microsofts.
Bill Gates
Predictive computing systems that become
smarter with experience
“
“
” ”
4. Me, Microsoft & Machine Learning
15 years of realizing innovation
1999 2004 2005 2008 2010 2012 2014
SQL Server
enables
data mining
Computers
work on users
behalf, filtering
junk email
Microsoft
Kinect can
watch users
gestures
Microsoft
launches
Azure Machine
Learning
Microsoft
search engine
built with
machine
learning
Bing Maps
ships with ML
traffic-prediction
service
Successful,
real-time,
speech-to-speech
translation
John Platt,
Distinguished scientist at
Microsoft Research
Machine learning is pervasive throughout
“ Microsoft products. ”
8. When presented with information we tell
ourselves stories, we have biases and we
have a very low level of intuitive
understanding of statistical information
(that’s not to say we cant spend the effort to analyze)
9. Any sufficiently advanced technology is
indistinguishable from magic..
Arthur C. Clarke, 1961 If and to what
extent the magic of
Machine Learning
changes YOUR
world depends on
how YOU use it!
“ ”
If you not actually using the
data available to make
systematic decisions in your
business you will mostly be
guessing or at best relying
heavily on potentially biased
intuition
10. The United States Postal Service
processed over 150 billion pieces
of mail in 2013—far too much for
efficient human sorting.
But as recently as 1997, only
10% of hand-addressed mail was
successfully sorted automatically.
11. The challenge in automation is
enabling computers to interpret
endless variation in handwriting.
12. By providing feedback, the Postal
Service was able to train
computers to accurately read
human handwriting.
Today, with the help of machine
learning, over 98% of all mail is
successfully processed by
machines.
14. Smart Buildings: IoT and ML example
The Center for Building Performance and
Diagnostics uses weather forecasts, real-time
temperature reads, and behavioral research data
to optimize building heating and cooling systems
in real-time.
Key Benefits
• User friendly set up and integration with
The ease of implementation
makes machine learning
accessible to a larger number
of investigators with various
backgrounds—even non-data
scientists.
Bertrand Lasternas
Carnegie Mellon
existing systems
• Seamless data handling
• Accessible and easy to use across
backgrounds
• Quickly compare algorithms
“
”
15. Using past data to predict the future
Imagine what
machine learning
could do for your
business.
Churn
analysis
Equipment
monitoring
Spam
filtering
Ad
targeting
Recommendation
Fraud
detection
Image
detection &
classification
Forecasting
Anomaly
detection
16. Common Classes of Problems
Classification Regression Recommenders Anomaly
Detection
18. Machine Learning Problem Requirements
Available data
• Related to the decision
• Historical
• Outcomes
Valuable business problem
involving a decision
– Existing process
– Metrics
19. Universal Machine Learning Flow
• Define Objective
• Measurable and has supporting data
• Collect & Prepare Data
Define
Objective
• Flatten schema,
• normalize and common scale
• Feature selection
• Sample and split
• Train Model
• Algorithm selection
• Parameter Sweeping
• Analyze Results
• Score, evaluate and visualize
Collect &
Prepare
Data
Train
Model
Analyze
Results
20.
21.
22.
23.
24.
25.
26.
27. Put ML into Production
Technically make available as
a published service
Share usage and outcome
information inside of the
organization.
Define
Prepare
Train
Analyze
Publish
Use
Monitor