More Related Content Similar to Machine Learning Loves Hadoop (20) More from Cloudera, Inc. (20) Machine Learning Loves Hadoop2. 2
Agenda
©2014 Cloudera, Inc. All rights reserved.
Hadoop and Cloudera Overview
Machine Learning + The Enterprise Data Hub
Machine Learning in Practice
Q&A
Speakers
TJ Laher
Product Marketing
Sean Owen
Director of Data Science
Get Social
#ClouderaWebinars
3. 3
Where Hadoop Began
©2014 Cloudera, Inc. All rights reserved.
Web Indexing, Google Earth,
Google Finance
Web Indexing Storing User Generated Data
2006 2008 2010
4. 4
How Cloudera Accelerated Adoption
©2014 Cloudera, Inc. All rights reserved.
2008 2009 2010 2011 2012 2013 2014
CDH
Cloudera
Manager
CLOUDERA
ENTERPRISE
4
ASK BIGGER
QUESTIONS
ENTERPRISE
DATA HUB
Cloudera
Launched
Hadoop
Creator, Doug
Cutting, Joins
Launch CDH:
1st Commercial
Hadoop Distro
Launch
Cloudera
Manager: 1st
Hadoop
Management
Application
Cloudera U
Expands to 140
Countries
100 Customer
in Production
Release
Cloudera
Enterprise 4
300 Partners in
Cloudera
Connect
Introduce
Cloudera
Navigator,
Impala, Search
Realized the
Enterprise Data
Hub
Tom Reilly
Joins as CEO
5. 5
The Enterprise Data Hub
©2014 Cloudera, Inc. All rights reserved.
EDHpoweredby
ApacheHadoop™
Unified
Out-of-the-box capabilities for
infinite scalability for storage,
ingest, access, metadata,
security, governance, and
management
Compliance-Ready
End-to-end security and
governance: authentication,
authorization, encryption, audit,
and lineage
Accessible
Utilize familiar tools and skills to
get value from your data faster
Multiple frameworks, including
batch and stream processing, in-
memory analytic SQL, enterprise
search, machine learning
Open
100% open source
– all components are Apache
licensed
Deploy in the cloud, on-premises,
or with an appliance
Social
Financial
Transactions
Sensor
OR
6. 6
What does an EDH look like?
Model Building BI/ Visualizations Point Solutions
Processing Online
NoSQL
DBMS
Analytic
MPP DBMS
Search
Engine
Batch
Processing
Stream
Processing
Machine
Learning
Unified Management & Distributed Storage
Management &
Storage
Applications
Data Sources
Custom
Solutions
Management
Security & Governance
Metadata
Data
8. 8
Why do we use machine learning?
©2014 Cloudera, Inc. All rights reserved.
Transaction Classification
Recommendation Engine
Dynamic Pricing
…
Drug Discovery
Energy Exploration
Executive Reports
…
Operational AnalyticsInvestigative
9. 9
Machine Learning Breakdown
©2014 Cloudera, Inc. All rights reserved.
Classification
Regression
Clustering
Collaborative Filtering
Category Algorithm Goal
Logistic Regression &
Random Decision Forest
Generalized Linear Models
K-means++
Alternating Least Squares
SupervisedUnsupervised
Pattern Recognition
Predict Future Values
Segment Historic Data
Recommend Items
10. 10
Common Challenges with Machine Learning
©2014 Cloudera, Inc. All rights reserved.
Challenges The Cost
Time
False Positive and Negatives
Uncertainty of Model Quality
Unable to Explain and Improve Models
Bad Results
Traditional
Systems
Feature Generation & Selection
Overfitting
Historic Testing
Dirty Data
Debugging Models
11. 11
How an Enterprise Data Hub Helps
©2014 Cloudera, Inc. All rights reserved.
Challenges The Benefit
Enterprise
Data Hub
Reduce Iteration Time
Eliminate Sampling
Test on Archived Data
Audit Data Trail
Immediate Data Access
Feature Generation & Selection
Overfitting
Historic Testing
Dirty Data
Debugging Models
13. 13
Fraud Detection
©2014 Cloudera, Inc. All rights reserved.
Data Algorithm
Outcome
Credit Card Transactions K-means++
Machine learning model leads to reduction of false negatives saving
organizations millions of dollars in fraud loss.
Management
Security & Governance
Metadata
Data
Offline Online
Cloudera
Navigator
Distributed Storage Modeling Rules Engine
14. 14
Product Recommendations
©2014 Cloudera, Inc. All rights reserved.
Data Algorithm
Outcome
Customer Purchases
Social Data
Alternating Least Squares
Product recommendation engine, powered by machine learning model,
increases purchase conversation rates.
Management
Security & Governance
Metadata
Data
Distributed Storage Modeling Serve Value
Offline Online
Product #1
Product #2
Product #3
15. 15
Predictive Maintenance
©2014 Cloudera, Inc. All rights reserved.
Data Algorithm
Outcome
Machine Sensors Logistic Regression
Machine learning model alerts employees for early identification of machine
failure reducing onsite visits.
Offline Online
Sensor Data Modeling Custom Application
17. 17
What’s Next?
©2014 Cloudera, Inc. All rights reserved.
TJ Laher
tlaher@cloudera.com
Sean Owen
sowen@cloudera.com
Contact Us
@Cloudera
1-866-843-7207
Use discount code Analytics10 to save 10% on new
enrollments in classes delivered by Cloudera until Sept ‘14*
Use discount code 15off2 to save 15% on enrollments in two
or more classes delivered by Cloudera until Sept ‘14*
Register now for Data Analyst, Spark,
or Data Science training at
http://university.cloudera.com