SlideShare a Scribd company logo
1 of 17
1
Machine Learning Loves Hadoop
Enabling Machine Learning to Accelerate Data Returns
2
Agenda
©2014 Cloudera, Inc. All rights reserved.
Hadoop and Cloudera Overview
Machine Learning + The Enterprise Data Hub
Machine Learning in Practice
Q&A
Speakers
TJ Laher
Product Marketing
Sean Owen
Director of Data Science
Get Social
#ClouderaWebinars
3
Where Hadoop Began
©2014 Cloudera, Inc. All rights reserved.
Web Indexing, Google Earth,
Google Finance
Web Indexing Storing User Generated Data
2006 2008 2010
4
How Cloudera Accelerated Adoption
©2014 Cloudera, Inc. All rights reserved.
2008 2009 2010 2011 2012 2013 2014
CDH
Cloudera
Manager
CLOUDERA
ENTERPRISE
4
ASK BIGGER
QUESTIONS
ENTERPRISE
DATA HUB
Cloudera
Launched
Hadoop
Creator, Doug
Cutting, Joins
Launch CDH:
1st Commercial
Hadoop Distro
Launch
Cloudera
Manager: 1st
Hadoop
Management
Application
Cloudera U
Expands to 140
Countries
100 Customer
in Production
Release
Cloudera
Enterprise 4
300 Partners in
Cloudera
Connect
Introduce
Cloudera
Navigator,
Impala, Search
Realized the
Enterprise Data
Hub
Tom Reilly
Joins as CEO
5
The Enterprise Data Hub
©2014 Cloudera, Inc. All rights reserved.
EDHpoweredby
ApacheHadoop™
Unified
Out-of-the-box capabilities for
infinite scalability for storage,
ingest, access, metadata,
security, governance, and
management
Compliance-Ready
End-to-end security and
governance: authentication,
authorization, encryption, audit,
and lineage
Accessible
Utilize familiar tools and skills to
get value from your data faster
Multiple frameworks, including
batch and stream processing, in-
memory analytic SQL, enterprise
search, machine learning
Open
100% open source
– all components are Apache
licensed
Deploy in the cloud, on-premises,
or with an appliance
Social
Financial
Transactions
Sensor
OR
6
What does an EDH look like?
Model Building BI/ Visualizations Point Solutions
Processing Online
NoSQL
DBMS
Analytic
MPP DBMS
Search
Engine
Batch
Processing
Stream
Processing
Machine
Learning
Unified Management & Distributed Storage
Management &
Storage
Applications
Data Sources
Custom
Solutions
Management
Security & Governance
Metadata
Data
7
Machine Learning + The Enterprise Data Hub
8
Why do we use machine learning?
©2014 Cloudera, Inc. All rights reserved.
Transaction Classification
Recommendation Engine
Dynamic Pricing
…
Drug Discovery
Energy Exploration
Executive Reports
…
Operational AnalyticsInvestigative
9
Machine Learning Breakdown
©2014 Cloudera, Inc. All rights reserved.
Classification
Regression
Clustering
Collaborative Filtering
Category Algorithm Goal
Logistic Regression &
Random Decision Forest
Generalized Linear Models
K-means++
Alternating Least Squares
SupervisedUnsupervised
Pattern Recognition
Predict Future Values
Segment Historic Data
Recommend Items
10
Common Challenges with Machine Learning
©2014 Cloudera, Inc. All rights reserved.
Challenges The Cost
Time
False Positive and Negatives
Uncertainty of Model Quality
Unable to Explain and Improve Models
Bad Results
Traditional
Systems
Feature Generation & Selection
Overfitting
Historic Testing
Dirty Data
Debugging Models
11
How an Enterprise Data Hub Helps
©2014 Cloudera, Inc. All rights reserved.
Challenges The Benefit
Enterprise
Data Hub
Reduce Iteration Time
Eliminate Sampling
Test on Archived Data
Audit Data Trail
Immediate Data Access
Feature Generation & Selection
Overfitting
Historic Testing
Dirty Data
Debugging Models
12
Machine Learning in Practice
13
Fraud Detection
©2014 Cloudera, Inc. All rights reserved.
Data Algorithm
Outcome
Credit Card Transactions K-means++
Machine learning model leads to reduction of false negatives saving
organizations millions of dollars in fraud loss.
Management
Security & Governance
Metadata
Data
Offline Online
Cloudera
Navigator
Distributed Storage Modeling Rules Engine
14
Product Recommendations
©2014 Cloudera, Inc. All rights reserved.
Data Algorithm
Outcome
Customer Purchases
Social Data
Alternating Least Squares
Product recommendation engine, powered by machine learning model,
increases purchase conversation rates.
Management
Security & Governance
Metadata
Data
Distributed Storage Modeling Serve Value
Offline Online
Product #1
Product #2
Product #3
15
Predictive Maintenance
©2014 Cloudera, Inc. All rights reserved.
Data Algorithm
Outcome
Machine Sensors Logistic Regression
Machine learning model alerts employees for early identification of machine
failure reducing onsite visits.
Offline Online
Sensor Data Modeling Custom Application
16
Q&A
17
What’s Next?
©2014 Cloudera, Inc. All rights reserved.
TJ Laher
tlaher@cloudera.com
Sean Owen
sowen@cloudera.com
Contact Us
@Cloudera
1-866-843-7207
Use discount code Analytics10 to save 10% on new
enrollments in classes delivered by Cloudera until Sept ‘14*
Use discount code 15off2 to save 15% on enrollments in two
or more classes delivered by Cloudera until Sept ‘14*
Register now for Data Analyst, Spark,
or Data Science training at
http://university.cloudera.com

More Related Content

What's hot

What's hot (20)

Solr consistency and recovery internals
Solr consistency and recovery internalsSolr consistency and recovery internals
Solr consistency and recovery internals
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera

 
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateApache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance Update
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with Search
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ Zoosk
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 

Viewers also liked

Shuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop TerasortShuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop Terasort
pramodbiligiri
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
russell_jurney
 

Viewers also liked (20)

Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Artificial Intelligence Layer: Mahout, MLLib, and other projects
Artificial Intelligence Layer: Mahout, MLLib, and other projectsArtificial Intelligence Layer: Mahout, MLLib, and other projects
Artificial Intelligence Layer: Mahout, MLLib, and other projects
 
Shuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop TerasortShuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop Terasort
 
Hadoop in Love
Hadoop in LoveHadoop in Love
Hadoop in Love
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
Using Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human TraffickingUsing Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human Trafficking
 
Social media strategy
Social media strategySocial media strategy
Social media strategy
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Big Data: Real-life examples of Business Value Generation with Cloudera
Big Data: Real-life examples of Business Value Generation with ClouderaBig Data: Real-life examples of Business Value Generation with Cloudera
Big Data: Real-life examples of Business Value Generation with Cloudera
 
White Paper: Turning Anonymous Shoppers into Known Customers
White Paper: Turning Anonymous Shoppers into Known CustomersWhite Paper: Turning Anonymous Shoppers into Known Customers
White Paper: Turning Anonymous Shoppers into Known Customers
 
Hortonworks Sandbox Startup Guide for VirtualBox
Hortonworks Sandbox Startup Guide for VirtualBoxHortonworks Sandbox Startup Guide for VirtualBox
Hortonworks Sandbox Startup Guide for VirtualBox
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2Cost effective BigData Processing on Amazon EC2
Cost effective BigData Processing on Amazon EC2
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
 
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & ComplianceCortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
Cortana Analytics Workshop: Cortana Analytics -- Security, Privacy & Compliance
 
The Benefits of Predictive and Proactive Support for an Enterprise Data Hub
The Benefits of Predictive and Proactive Support for an Enterprise Data HubThe Benefits of Predictive and Proactive Support for an Enterprise Data Hub
The Benefits of Predictive and Proactive Support for an Enterprise Data Hub
 
Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 

Similar to Machine Learning Loves Hadoop

Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
inevitablecloud
 

Similar to Machine Learning Loves Hadoop (20)

Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
How Big Data Can Help Marketers Improve Customer Relationships
How Big Data Can Help Marketers Improve Customer RelationshipsHow Big Data Can Help Marketers Improve Customer Relationships
How Big Data Can Help Marketers Improve Customer Relationships
 
A better business case for big data with Hadoop
A better business case for big data with HadoopA better business case for big data with Hadoop
A better business case for big data with Hadoop
 
Understanding Customer Buying Journey with Big Data
Understanding Customer Buying Journey with Big DataUnderstanding Customer Buying Journey with Big Data
Understanding Customer Buying Journey with Big Data
 
Oracle's Cloud Strategy
Oracle's Cloud StrategyOracle's Cloud Strategy
Oracle's Cloud Strategy
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
 
Analytics, Everywhere. Keys to Effective Analytics and Data Discovery
Analytics, Everywhere. Keys to Effective Analytics and Data DiscoveryAnalytics, Everywhere. Keys to Effective Analytics and Data Discovery
Analytics, Everywhere. Keys to Effective Analytics and Data Discovery
 
eFolder Acquires Cloudfinder: The Next Profit Opportunity
eFolder Acquires Cloudfinder: The Next Profit OpportunityeFolder Acquires Cloudfinder: The Next Profit Opportunity
eFolder Acquires Cloudfinder: The Next Profit Opportunity
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
Cloudera for Internet of Things
Cloudera for Internet of ThingsCloudera for Internet of Things
Cloudera for Internet of Things
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Innovation Without Compromise: The Challenges of Securing Big Data
Innovation Without Compromise: The Challenges of Securing Big DataInnovation Without Compromise: The Challenges of Securing Big Data
Innovation Without Compromise: The Challenges of Securing Big Data
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning
 
The Journey to Success with Big Data
The Journey to Success with Big DataThe Journey to Success with Big Data
The Journey to Success with Big Data
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 

Recently uploaded (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 

Machine Learning Loves Hadoop

  • 1. 1 Machine Learning Loves Hadoop Enabling Machine Learning to Accelerate Data Returns
  • 2. 2 Agenda ©2014 Cloudera, Inc. All rights reserved. Hadoop and Cloudera Overview Machine Learning + The Enterprise Data Hub Machine Learning in Practice Q&A Speakers TJ Laher Product Marketing Sean Owen Director of Data Science Get Social #ClouderaWebinars
  • 3. 3 Where Hadoop Began ©2014 Cloudera, Inc. All rights reserved. Web Indexing, Google Earth, Google Finance Web Indexing Storing User Generated Data 2006 2008 2010
  • 4. 4 How Cloudera Accelerated Adoption ©2014 Cloudera, Inc. All rights reserved. 2008 2009 2010 2011 2012 2013 2014 CDH Cloudera Manager CLOUDERA ENTERPRISE 4 ASK BIGGER QUESTIONS ENTERPRISE DATA HUB Cloudera Launched Hadoop Creator, Doug Cutting, Joins Launch CDH: 1st Commercial Hadoop Distro Launch Cloudera Manager: 1st Hadoop Management Application Cloudera U Expands to 140 Countries 100 Customer in Production Release Cloudera Enterprise 4 300 Partners in Cloudera Connect Introduce Cloudera Navigator, Impala, Search Realized the Enterprise Data Hub Tom Reilly Joins as CEO
  • 5. 5 The Enterprise Data Hub ©2014 Cloudera, Inc. All rights reserved. EDHpoweredby ApacheHadoop™ Unified Out-of-the-box capabilities for infinite scalability for storage, ingest, access, metadata, security, governance, and management Compliance-Ready End-to-end security and governance: authentication, authorization, encryption, audit, and lineage Accessible Utilize familiar tools and skills to get value from your data faster Multiple frameworks, including batch and stream processing, in- memory analytic SQL, enterprise search, machine learning Open 100% open source – all components are Apache licensed Deploy in the cloud, on-premises, or with an appliance Social Financial Transactions Sensor OR
  • 6. 6 What does an EDH look like? Model Building BI/ Visualizations Point Solutions Processing Online NoSQL DBMS Analytic MPP DBMS Search Engine Batch Processing Stream Processing Machine Learning Unified Management & Distributed Storage Management & Storage Applications Data Sources Custom Solutions Management Security & Governance Metadata Data
  • 7. 7 Machine Learning + The Enterprise Data Hub
  • 8. 8 Why do we use machine learning? ©2014 Cloudera, Inc. All rights reserved. Transaction Classification Recommendation Engine Dynamic Pricing … Drug Discovery Energy Exploration Executive Reports … Operational AnalyticsInvestigative
  • 9. 9 Machine Learning Breakdown ©2014 Cloudera, Inc. All rights reserved. Classification Regression Clustering Collaborative Filtering Category Algorithm Goal Logistic Regression & Random Decision Forest Generalized Linear Models K-means++ Alternating Least Squares SupervisedUnsupervised Pattern Recognition Predict Future Values Segment Historic Data Recommend Items
  • 10. 10 Common Challenges with Machine Learning ©2014 Cloudera, Inc. All rights reserved. Challenges The Cost Time False Positive and Negatives Uncertainty of Model Quality Unable to Explain and Improve Models Bad Results Traditional Systems Feature Generation & Selection Overfitting Historic Testing Dirty Data Debugging Models
  • 11. 11 How an Enterprise Data Hub Helps ©2014 Cloudera, Inc. All rights reserved. Challenges The Benefit Enterprise Data Hub Reduce Iteration Time Eliminate Sampling Test on Archived Data Audit Data Trail Immediate Data Access Feature Generation & Selection Overfitting Historic Testing Dirty Data Debugging Models
  • 13. 13 Fraud Detection ©2014 Cloudera, Inc. All rights reserved. Data Algorithm Outcome Credit Card Transactions K-means++ Machine learning model leads to reduction of false negatives saving organizations millions of dollars in fraud loss. Management Security & Governance Metadata Data Offline Online Cloudera Navigator Distributed Storage Modeling Rules Engine
  • 14. 14 Product Recommendations ©2014 Cloudera, Inc. All rights reserved. Data Algorithm Outcome Customer Purchases Social Data Alternating Least Squares Product recommendation engine, powered by machine learning model, increases purchase conversation rates. Management Security & Governance Metadata Data Distributed Storage Modeling Serve Value Offline Online Product #1 Product #2 Product #3
  • 15. 15 Predictive Maintenance ©2014 Cloudera, Inc. All rights reserved. Data Algorithm Outcome Machine Sensors Logistic Regression Machine learning model alerts employees for early identification of machine failure reducing onsite visits. Offline Online Sensor Data Modeling Custom Application
  • 17. 17 What’s Next? ©2014 Cloudera, Inc. All rights reserved. TJ Laher tlaher@cloudera.com Sean Owen sowen@cloudera.com Contact Us @Cloudera 1-866-843-7207 Use discount code Analytics10 to save 10% on new enrollments in classes delivered by Cloudera until Sept ‘14* Use discount code 15off2 to save 15% on enrollments in two or more classes delivered by Cloudera until Sept ‘14* Register now for Data Analyst, Spark, or Data Science training at http://university.cloudera.com