SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Distributed Deep Learning for 
Classification and Regression 
Problems using H2O 
Arno Candel, H2O.ai 
MLconf SF 2014, San Francisco, 11/14/14 
Join us at H2O World 2014 | November 18th and 19th | Computer History Museum, Mountain View 
! 
Register now at http://h2oworld2014.eventbrite.com
Who am I? 
@ArnoCandel 
PhD in Computational Physics, 2005 
from ETH Zurich Switzerland 
! 
6 years at SLAC - Accelerator Physics Modeling 
2 years at Skytree - Machine Learning 
11 months at H2O.ai - Machine Learning 
! 
15 years in HPC/Supercomputing/Modeling 
! 
Named “2014 Big Data All-Star” by Fortune Magazine 
!
H2O Deep Learning, @ArnoCandel 
Outline 
Intro (5 mins) 
Methods & Implementation (5 mins) 
Results & Live Demos (15 mins) 
Higgs boson classification 
MNIST handwritten digits 
text classification 
3
H2O Deep Learning, @ArnoCandel 
Teamwork at H2O.ai 
Java, Apache v2 Open-Source 
#1 Java Machine Learning in Github 
Join the community! 
4
H2O Deep Learning, @ArnoCandel 
H2O: Open-Source (Apache v2) 
Predictive Analytics Platform 
5
H2O Deep Learning, @ArnoCandel 6 
H2O Architecture - Designed for speed, 
scale, accuracy & ease of use 
Key technical points: 
• distributed JVMs + REST API 
• no Java GC issues 
(data in byte[], Double) 
• loss-less number compression 
• Hadoop integration (v1,YARN) 
• R package (CRAN) 
Pre-built fully featured algos: 
K-Means, NB, GLM, RF, GBM, DL
H2O Deep Learning, @ArnoCandel 
What is Deep Learning? 
Wikipedia: 
Deep learning is a set of algorithms in 
machine learning that attempt to model 
high-level abstractions in data by using 
architectures composed of multiple 
non-linear transformations. 
Input: 
Image 
Output: 
User ID 
7 
Example: Facebook DeepFace
H2O Deep Learning, @ArnoCandel 
What is NOT Deep 
Linear models are not deep 
(by definition) 
! 
Neural nets with 1 hidden layer are not deep 
(only 1 layer - no feature hierarchy) 
! 
SVMs and Kernel methods are not deep 
(2 layers: kernel + linear) 
! 
Classification trees are not deep 
(operate on original input space, no new features generated) 
8
H2O Deep Learning, @ArnoCandel 
H2O Deep Learning 
1970s multi-layer feed-forward Neural Network 
(stochastic gradient descent with back-propagation) 
! 
+ distributed processing for big data 
(fine-grain in-memory MapReduce on distributed data) 
! 
+ multi-threaded speedup 
(async fork/join worker threads operate at FORTRAN speeds) 
! 
+ smart algorithms for fast & accurate results 
(automatic standardization, one-hot encoding of categoricals, missing value imputation, weight & 
bias initialization, adaptive learning rate, momentum, dropout/l1/L2 regularization, grid search, 
N-fold cross-validation, checkpointing, load balancing, auto-tuning, model averaging, etc.)s 
! 
= powerful tool for (un)supervised 
machine learning on real-world data 
9 
all 320 cores maxed out
H2O Deep Learning, @ArnoCandel 
H2O Deep Learning Architecture 
K-V 
HTTPD 
nodes/JVMs: sync 
threads: async 
communication 
K-V 
HTTPD 
w 
1 
w w 
2 
1 
w w w w 
1 3 2 4 
w1 w3 w2 
w4 
3 2 
w w2+w4 1+w3 
4 
1 2 
w* = (w1+w2+w3+w4)/4 
map: 
each node trains a 
copy of the weights 
and biases with 
(some* or all of) its 
local data with 
asynchronous F/J 
threads 
initial model: weights and biases w 
1 
1 
updated model: w* 
H2O atomic 
in-memory 
K-V store 
reduce: 
model averaging: 
average weights and 
biases from all nodes, 
speedup is at least 
#nodes/log(#rows) 
arxiv:1209.4129v3 
i 
Query & display 
the model via 
JSON, WWW 
Keep iterating over the data (“epochs”), score from time to time 
*auto-tuned (default) or user-specified number of points per MapReduce iteration 
10
H2O Deep Learning, @ArnoCandel 11 
Application: Higgs Boson Classification 
Large Hadron Collider: Largest experiment of mankind! 
$13+ billion, 16.8 miles long, 120 MegaWatts, -456F, 1PB/day, etc. 
Higgs boson discovery (July ’12) led to 2013 Nobel prize! 
Higgs 
vs 
Background 
http://arxiv.org/pdf/1402.4735v2.pdf 
Images courtesy CERN / LHC 
HIGGS UCI Dataset: 
21 low-level features AND 
7 high-level derived features (physics formulae) 
Train: 10M rows, Valid: 500k, Test: 500k rows
H2O Deep Learning, @ArnoCandel 12 
Higgs: Derived features are important! 
? ? ? 
Former baseline for AUC: 0.733 and 0.816 
H2O Algorithm low-level H2O AUC all features H2O AUC 
Generalized Linear Model 0.596 0.684 
Random Forest 0.764 0.840 
add 
derived 
Gradient Boosted Trees 0.753 ! 
0.839 
Neural Net 1 hidden layer 0.760 features 
0.830 
H2O Deep Learning ? 
Live Demo: Let’s see what Deep Learning 
can do with low-level features alone!
H2O Deep Learning, @ArnoCandel 
MNIST: digits classification 
MNIST = Digitized handwritten 
digits database (Yann LeCun) 
Yann LeCun: “Yet another advice: don't get 
fooled by people who claim to have a solution 
to Artificial General Intelligence. Ask them what 
error rate they get on MNIST or ImageNet.” 
Data: 28x28=784 pixels with 
(gray-scale) values in 0…255 
Standing world record: 
Without distortions or 
convolutions, the best-ever 
published error rate on test 
set: 0.83% (Microsoft) 
13 
Train: 60,000 rows 784 integer columns 10 classes 
Test: 10,000 rows 784 integer columns 10 classes
H2O Deep Learning, @ArnoCandel 14 
H2O Deep Learning beats MNIST 
Standard 60k/10k data 
No distortions 
No convolutions 
No unsupervised training 
No ensemble 
! 
4 hours on 4 16-core nodes 
World-class result! 
0.87% test set error
H2O Deep Learning, @ArnoCandel 
POJO Model Export for 
Production Scoring 
15 
Plain old Java code is 
auto-generated to take 
your H2O Deep Learning 
models into production!
H2O Deep Learning, @ArnoCandel 
Text Classification 
Goal: Predict the item from 
seller’s text description 
16 
“Vintage 18KT gold Rolex 2 Tone 
in great condition” 
Data: Bag of words vector 0,0,1,0,0,0,0,0,1,0,0,0,1,…,0 
gold vintage condition 
Train: 578,361 rows 8,647 cols 467 classes 
Test: 64,263 rows 8,647 cols 143 classes
H2O Deep Learning, @ArnoCandel 
17 
Text Classification 
Train: 578,361 rows 8,647 cols 467 classes 
Test: 64,263 rows 8,647 cols 143 classes 
Out-Of-The-Box: 11.6% test set error after 10 epochs! 
Predicts the correct class (out of 143) 88.4% of the time! 
Note 1: H2O columnar-compressed in-memory 
store only needs 60 MB to store 5 billion 
values (dense CSV needs 18 GB) 
Note 2: No tuning was done 
(results are for illustration only)
H2O Deep Learning, @ArnoCandel 
MNIST: Unsupervised Anomaly Detection 
with Deep Learning (Autoencoder) 
18 
https://github.com/0xdata/h2o/blob/master/R/tests/testdir_algos/deeplearning/runit_DeepLearning_Anomaly.R 
small/median/large reconstruction error: 
The good The bad The ugly
H2O Deep Learning, @ArnoCandel 19 
Higgs: Live Demo (Continued) 
How well did 
Deep Learning do? 
<your guess?> 
reference paper results 
Any guesses for AUC on low-level features? 
AUC=0.76 was the best for RF/GBM/NN (H2O) 
Let’s see how H2O did in the past 10 minutes!
H2O Deep Learning, @ArnoCandel 20 
Scoring Higgs Models in H2O Steam 
Live Demo on 10-node cluster: 
<10 minutes runtime for all H2O algos! 
Better than LHC baseline of AUC=0.73!
H2O Deep Learning, @ArnoCandel 21 
Higgs Particle Detection with H2O 
HIGGS UCI Dataset: 
21 low-level features AND 
7 high-level derived features 
Train: 10M rows, Test: 500k rows 
Algorithm 
*Nature paper: http://arxiv.org/pdf/1402.4735v2.pdf 
Paper’s 
l-l AUC 
low-level 
H2O AUC 
all features 
H2O AUC 
Parameters (not heavily tuned), 
H2O running on 10 nodes 
Generalized Linear Model - 0.596 0.684 default, binomial 
Random Forest - 0.764 0.840 50 trees, max depth 50 
Gradient Boosted Trees 0.73 0.753 0.839 50 trees, max depth 15 
Neural Net 1 layer 0.733 0.760 0.830 1x300 Rectifier, 100 epochs 
Deep Learning 3 hidden layers 0.836 0.850 - 3x1000 Rectifier, L2=1e-5, 40 epochs 
Deep Learning 4 hidden layers 0.868 0.869 - 4x500 Rectifier, L1=L2=1e-5, 300 epochs 
Deep Learning 5 hidden layers 0.880 0.871 - 5x500 Rectifier, L1=L2=1e-5 
Deep Learning on low-level features alone beats everything else! 
Prelim. H2O results compare well with paper’s results* (TMVA & Theano)
H2O Deep Learning, @ArnoCandel 22 
H2O Deep Learning Booklet 
R Vignette: Explains methods & parameters! 
Grab your copy today 
or download at 
http://t.co/kWzyFMGJ2S 
Even more tips & tricks in our other 
presentations and at H2O World next week!
H2O Deep Learning, @ArnoCandel 
You can participate! 
23 
- Images: Convolutional & Pooling Layers PUB-644 
- Sequences: Recurrent Neural Networks PUB-1052 
- Faster Training: GPGPU support PUB-1013 
- Pre-Training: Stacked Auto-Encoders PUB-1014 
- Ensembles PUB-1072 
- Use H2O at Kaggle Challenges!
H2O Deep Learning, @ArnoCandel 
H2O Kaggle Starter Scripts 
24
H2O Deep Learning, @ArnoCandel 
Join us at H2O World! 
25 
Next Tue & Wed! 
! 
http://h2o.ai/h2o-world/ 
Live Streaming 
Day 1 
• Hands-On Training 
• Supervised 
• Unsupervised 
• Advanced Topics 
• Markting Usecase 
• Product Demos 
• Hacker-Fest with 
Cliff Click (CTO, Hotspot) 
Day 2 
• Speakers from Academia & Industry 
• Trevor Hastie (ML) 
• John Chambers (S, R) 
• Josh Bloch (Java API) 
• Many use cases from customers 
• 3 Top Kaggle Contestants (Top 10) 
• 3 Panel discussions
H2O Deep Learning, @ArnoCandel 
Key Take-Aways 
H2O is an open source predictive analytics platform 
for data scientists and business analysts who need 
scalable and fast machine learning. 
! 
H2O Deep Learning is ready to take your advanced 
analytics to the next level - Try it on your data! 
! 
Join our Community and Meetups! 
https://github.com/h2oai 
h2ostream community forum 
www.h2o.ai/ 
@h2oai 
26 
Thank you!

Mais conteúdo relacionado

Mais procurados

H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
Sri Ambati
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Spark Summit
 

Mais procurados (20)

H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614H2O Distributed Deep Learning by Arno Candel 071614
H2O Distributed Deep Learning by Arno Candel 071614
 
How to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningHow to win data science competitions with Deep Learning
How to win data science competitions with Deep Learning
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịDistance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
 
Deep Learning and Reinforcement Learning
Deep Learning and Reinforcement LearningDeep Learning and Reinforcement Learning
Deep Learning and Reinforcement Learning
 
Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)Deep Learning with Python (PyData Seattle 2015)
Deep Learning with Python (PyData Seattle 2015)
 
H20: A platform for big math
H20: A platform for big math H20: A platform for big math
H20: A platform for big math
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
 
Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with R
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
Deep Water - GPU Deep Learning for H2O - Arno Candel
Deep Water - GPU Deep Learning for H2O - Arno CandelDeep Water - GPU Deep Learning for H2O - Arno Candel
Deep Water - GPU Deep Learning for H2O - Arno Candel
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta Meetup
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep Learning
 

Destaque

Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
MLconf
 
Scalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2OScalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2O
odsc
 

Destaque (20)

Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFLise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
 
Quoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SFQuoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SF
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SFAmeet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal Malohlava
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
GBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O APIGBM in H2O with Cliff Click: H2O API
GBM in H2O with Cliff Click: H2O API
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014
 
Transform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine LearningTransform your Business with AI, Deep Learning and Machine Learning
Transform your Business with AI, Deep Learning and Machine Learning
 
H2O Machine Learning and Kalman Filters for Machine Prognostics
H2O Machine Learning and Kalman Filters for Machine PrognosticsH2O Machine Learning and Kalman Filters for Machine Prognostics
H2O Machine Learning and Kalman Filters for Machine Prognostics
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LA
 
Hadoop cluster os_tuning_v1.0_20170106_mobile
Hadoop cluster os_tuning_v1.0_20170106_mobileHadoop cluster os_tuning_v1.0_20170106_mobile
Hadoop cluster os_tuning_v1.0_20170106_mobile
 
Scalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2OScalable Data Science and Deep Learning with H2O
Scalable Data Science and Deep Learning with H2O
 
Building Machine Learning Applications with Sparkling Water
Building Machine Learning Applications with Sparkling WaterBuilding Machine Learning Applications with Sparkling Water
Building Machine Learning Applications with Sparkling Water
 
Data Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2OData Science, Machine Learning, and H2O
Data Science, Machine Learning, and H2O
 

Semelhante a MLconf - Distributed Deep Learning for Classification and Regression Problems using H2O

ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochgArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
Sri Ambati
 
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
Sri Ambati
 
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
Sri Ambati
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
yhadoop
 
Data-Centric Parallel Programming
Data-Centric Parallel ProgrammingData-Centric Parallel Programming
Data-Centric Parallel Programming
inside-BigData.com
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
Evert Lammerts
 

Semelhante a MLconf - Distributed Deep Learning for Classification and Regression Problems using H2O (20)

ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochgArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
 
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
 
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
 
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
 
The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedIn
 
Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.
 
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
 
LinkedIn
LinkedInLinkedIn
LinkedIn
 
Cognitive Engine: Boosting Scientific Discovery
Cognitive Engine:  Boosting Scientific DiscoveryCognitive Engine:  Boosting Scientific Discovery
Cognitive Engine: Boosting Scientific Discovery
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Data-Centric Parallel Programming
Data-Centric Parallel ProgrammingData-Centric Parallel Programming
Data-Centric Parallel Programming
 
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
 
CT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloudCT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloud
 
Talk at Bioinformatics Open Source Conference, 2012
Talk at Bioinformatics Open Source Conference, 2012Talk at Bioinformatics Open Source Conference, 2012
Talk at Bioinformatics Open Source Conference, 2012
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Ceph Day Berlin: Community Update
Ceph Day Berlin:  Community UpdateCeph Day Berlin:  Community Update
Ceph Day Berlin: Community Update
 
Hambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2OHambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2O
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 

Mais de Sri Ambati

Mais de Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Último

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Último (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 

MLconf - Distributed Deep Learning for Classification and Regression Problems using H2O

  • 1. Distributed Deep Learning for Classification and Regression Problems using H2O Arno Candel, H2O.ai MLconf SF 2014, San Francisco, 11/14/14 Join us at H2O World 2014 | November 18th and 19th | Computer History Museum, Mountain View ! Register now at http://h2oworld2014.eventbrite.com
  • 2. Who am I? @ArnoCandel PhD in Computational Physics, 2005 from ETH Zurich Switzerland ! 6 years at SLAC - Accelerator Physics Modeling 2 years at Skytree - Machine Learning 11 months at H2O.ai - Machine Learning ! 15 years in HPC/Supercomputing/Modeling ! Named “2014 Big Data All-Star” by Fortune Magazine !
  • 3. H2O Deep Learning, @ArnoCandel Outline Intro (5 mins) Methods & Implementation (5 mins) Results & Live Demos (15 mins) Higgs boson classification MNIST handwritten digits text classification 3
  • 4. H2O Deep Learning, @ArnoCandel Teamwork at H2O.ai Java, Apache v2 Open-Source #1 Java Machine Learning in Github Join the community! 4
  • 5. H2O Deep Learning, @ArnoCandel H2O: Open-Source (Apache v2) Predictive Analytics Platform 5
  • 6. H2O Deep Learning, @ArnoCandel 6 H2O Architecture - Designed for speed, scale, accuracy & ease of use Key technical points: • distributed JVMs + REST API • no Java GC issues (data in byte[], Double) • loss-less number compression • Hadoop integration (v1,YARN) • R package (CRAN) Pre-built fully featured algos: K-Means, NB, GLM, RF, GBM, DL
  • 7. H2O Deep Learning, @ArnoCandel What is Deep Learning? Wikipedia: Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using architectures composed of multiple non-linear transformations. Input: Image Output: User ID 7 Example: Facebook DeepFace
  • 8. H2O Deep Learning, @ArnoCandel What is NOT Deep Linear models are not deep (by definition) ! Neural nets with 1 hidden layer are not deep (only 1 layer - no feature hierarchy) ! SVMs and Kernel methods are not deep (2 layers: kernel + linear) ! Classification trees are not deep (operate on original input space, no new features generated) 8
  • 9. H2O Deep Learning, @ArnoCandel H2O Deep Learning 1970s multi-layer feed-forward Neural Network (stochastic gradient descent with back-propagation) ! + distributed processing for big data (fine-grain in-memory MapReduce on distributed data) ! + multi-threaded speedup (async fork/join worker threads operate at FORTRAN speeds) ! + smart algorithms for fast & accurate results (automatic standardization, one-hot encoding of categoricals, missing value imputation, weight & bias initialization, adaptive learning rate, momentum, dropout/l1/L2 regularization, grid search, N-fold cross-validation, checkpointing, load balancing, auto-tuning, model averaging, etc.)s ! = powerful tool for (un)supervised machine learning on real-world data 9 all 320 cores maxed out
  • 10. H2O Deep Learning, @ArnoCandel H2O Deep Learning Architecture K-V HTTPD nodes/JVMs: sync threads: async communication K-V HTTPD w 1 w w 2 1 w w w w 1 3 2 4 w1 w3 w2 w4 3 2 w w2+w4 1+w3 4 1 2 w* = (w1+w2+w3+w4)/4 map: each node trains a copy of the weights and biases with (some* or all of) its local data with asynchronous F/J threads initial model: weights and biases w 1 1 updated model: w* H2O atomic in-memory K-V store reduce: model averaging: average weights and biases from all nodes, speedup is at least #nodes/log(#rows) arxiv:1209.4129v3 i Query & display the model via JSON, WWW Keep iterating over the data (“epochs”), score from time to time *auto-tuned (default) or user-specified number of points per MapReduce iteration 10
  • 11. H2O Deep Learning, @ArnoCandel 11 Application: Higgs Boson Classification Large Hadron Collider: Largest experiment of mankind! $13+ billion, 16.8 miles long, 120 MegaWatts, -456F, 1PB/day, etc. Higgs boson discovery (July ’12) led to 2013 Nobel prize! Higgs vs Background http://arxiv.org/pdf/1402.4735v2.pdf Images courtesy CERN / LHC HIGGS UCI Dataset: 21 low-level features AND 7 high-level derived features (physics formulae) Train: 10M rows, Valid: 500k, Test: 500k rows
  • 12. H2O Deep Learning, @ArnoCandel 12 Higgs: Derived features are important! ? ? ? Former baseline for AUC: 0.733 and 0.816 H2O Algorithm low-level H2O AUC all features H2O AUC Generalized Linear Model 0.596 0.684 Random Forest 0.764 0.840 add derived Gradient Boosted Trees 0.753 ! 0.839 Neural Net 1 hidden layer 0.760 features 0.830 H2O Deep Learning ? Live Demo: Let’s see what Deep Learning can do with low-level features alone!
  • 13. H2O Deep Learning, @ArnoCandel MNIST: digits classification MNIST = Digitized handwritten digits database (Yann LeCun) Yann LeCun: “Yet another advice: don't get fooled by people who claim to have a solution to Artificial General Intelligence. Ask them what error rate they get on MNIST or ImageNet.” Data: 28x28=784 pixels with (gray-scale) values in 0…255 Standing world record: Without distortions or convolutions, the best-ever published error rate on test set: 0.83% (Microsoft) 13 Train: 60,000 rows 784 integer columns 10 classes Test: 10,000 rows 784 integer columns 10 classes
  • 14. H2O Deep Learning, @ArnoCandel 14 H2O Deep Learning beats MNIST Standard 60k/10k data No distortions No convolutions No unsupervised training No ensemble ! 4 hours on 4 16-core nodes World-class result! 0.87% test set error
  • 15. H2O Deep Learning, @ArnoCandel POJO Model Export for Production Scoring 15 Plain old Java code is auto-generated to take your H2O Deep Learning models into production!
  • 16. H2O Deep Learning, @ArnoCandel Text Classification Goal: Predict the item from seller’s text description 16 “Vintage 18KT gold Rolex 2 Tone in great condition” Data: Bag of words vector 0,0,1,0,0,0,0,0,1,0,0,0,1,…,0 gold vintage condition Train: 578,361 rows 8,647 cols 467 classes Test: 64,263 rows 8,647 cols 143 classes
  • 17. H2O Deep Learning, @ArnoCandel 17 Text Classification Train: 578,361 rows 8,647 cols 467 classes Test: 64,263 rows 8,647 cols 143 classes Out-Of-The-Box: 11.6% test set error after 10 epochs! Predicts the correct class (out of 143) 88.4% of the time! Note 1: H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB) Note 2: No tuning was done (results are for illustration only)
  • 18. H2O Deep Learning, @ArnoCandel MNIST: Unsupervised Anomaly Detection with Deep Learning (Autoencoder) 18 https://github.com/0xdata/h2o/blob/master/R/tests/testdir_algos/deeplearning/runit_DeepLearning_Anomaly.R small/median/large reconstruction error: The good The bad The ugly
  • 19. H2O Deep Learning, @ArnoCandel 19 Higgs: Live Demo (Continued) How well did Deep Learning do? <your guess?> reference paper results Any guesses for AUC on low-level features? AUC=0.76 was the best for RF/GBM/NN (H2O) Let’s see how H2O did in the past 10 minutes!
  • 20. H2O Deep Learning, @ArnoCandel 20 Scoring Higgs Models in H2O Steam Live Demo on 10-node cluster: <10 minutes runtime for all H2O algos! Better than LHC baseline of AUC=0.73!
  • 21. H2O Deep Learning, @ArnoCandel 21 Higgs Particle Detection with H2O HIGGS UCI Dataset: 21 low-level features AND 7 high-level derived features Train: 10M rows, Test: 500k rows Algorithm *Nature paper: http://arxiv.org/pdf/1402.4735v2.pdf Paper’s l-l AUC low-level H2O AUC all features H2O AUC Parameters (not heavily tuned), H2O running on 10 nodes Generalized Linear Model - 0.596 0.684 default, binomial Random Forest - 0.764 0.840 50 trees, max depth 50 Gradient Boosted Trees 0.73 0.753 0.839 50 trees, max depth 15 Neural Net 1 layer 0.733 0.760 0.830 1x300 Rectifier, 100 epochs Deep Learning 3 hidden layers 0.836 0.850 - 3x1000 Rectifier, L2=1e-5, 40 epochs Deep Learning 4 hidden layers 0.868 0.869 - 4x500 Rectifier, L1=L2=1e-5, 300 epochs Deep Learning 5 hidden layers 0.880 0.871 - 5x500 Rectifier, L1=L2=1e-5 Deep Learning on low-level features alone beats everything else! Prelim. H2O results compare well with paper’s results* (TMVA & Theano)
  • 22. H2O Deep Learning, @ArnoCandel 22 H2O Deep Learning Booklet R Vignette: Explains methods & parameters! Grab your copy today or download at http://t.co/kWzyFMGJ2S Even more tips & tricks in our other presentations and at H2O World next week!
  • 23. H2O Deep Learning, @ArnoCandel You can participate! 23 - Images: Convolutional & Pooling Layers PUB-644 - Sequences: Recurrent Neural Networks PUB-1052 - Faster Training: GPGPU support PUB-1013 - Pre-Training: Stacked Auto-Encoders PUB-1014 - Ensembles PUB-1072 - Use H2O at Kaggle Challenges!
  • 24. H2O Deep Learning, @ArnoCandel H2O Kaggle Starter Scripts 24
  • 25. H2O Deep Learning, @ArnoCandel Join us at H2O World! 25 Next Tue & Wed! ! http://h2o.ai/h2o-world/ Live Streaming Day 1 • Hands-On Training • Supervised • Unsupervised • Advanced Topics • Markting Usecase • Product Demos • Hacker-Fest with Cliff Click (CTO, Hotspot) Day 2 • Speakers from Academia & Industry • Trevor Hastie (ML) • John Chambers (S, R) • Josh Bloch (Java API) • Many use cases from customers • 3 Top Kaggle Contestants (Top 10) • 3 Panel discussions
  • 26. H2O Deep Learning, @ArnoCandel Key Take-Aways H2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning. ! H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data! ! Join our Community and Meetups! https://github.com/h2oai h2ostream community forum www.h2o.ai/ @h2oai 26 Thank you!