4. Overview
• What is Deep Learning?
• Deep Belief Networks
• Implementation on Hadoop/YARN
• Results
5.
6. What is Deep Learning?
Algorithm that tries to learn simple features in lower
layers
And more complex features in higher layers
7. Interesting Properties of Deep Learning
Reduces a problem with overfitting in neural
networks.
Introduces new techniques for "unsupervised feature
learning”
introduces new more automatic ways to figure out the
parts of your data you should feed into your learning
algorithm.
8. Chasing Nature
Learning sparse representations of auditory signals
leads to filters that closely correspond to neurons in
early audio processing in mammals
When applied to speech
Learned representations showed a striking
resemblance to the cochlear filters in the auditory
cortext
9. Yann LeCunn on Deep Learning
Has become the dominant method for acoustic
modeling in speech recognition
Quickly becoming the dominant method for several
vision tasks such as
object recognition
object detection
semantic segmentation.
10.
11. What is a Deep Belief Network?
Generative probabilistic model
Composed of one visible layer
Many hidden layers
Each hidden layer learns relationship between units
in lower layer
Higher layer representations tend to become more
complext
12. Restricted Boltzmann Machines
Unsupervised model: Does feature learning by repeated sampling of the
input data. Learns how to reconstruct data for good feature detection.
RBMs have different formulas for different kinds of data:
Binary
Continuous
13. DeepLearning4J
Implementation in Java
Self-contained & built on Akka, Hazelcast, Jblas
Distributed to run faster and with more features than
current Theano-based implementations.
Talks to any data source, expects one format.
14. Vectorized Implementation
Handles lots of data concurrently.
Any number of examples at once, but the code does
not change.
Faster: Allows for native/GPU execution.
One format: Everything is a matrix.
15. DL4J vs Theano Perf
GPUs are inherently faster than normal native.
Theano is not distributed, and GPUs have very low
RAM.
DL4J allows for situations where you have to “throw
CPUs at it.”
16. What are Good Applications for Deep Learning?
Image Processing
High MNIST Scores
Audio Processing
Current Champ on TIMIT dataset
Text / NLP Processing
Word2vec, etc
17.
18. Past Work: Parallel Iterative Algorithms on YARN
Started with
Parallel linear, logistic regression
Parallel Neural Networks
Packaged in Metronome
100% Java, ASF 2.0 Licensed, on github
21. SGD: Serial vs Parallel
21
Model
Training Data
Worker 1
Master
Partial
Model
Global Model
Worker 2
Partial Model
Worker N
Partial
Model
Split 1 Split 2 Split 3
…
22. Managing Resources
Running through YARN on hadoop is important
Allows for workflow scheduling
Allows for scheduler oversight
Allows the jobs to be first class citizens on Hadoop
And share resources nicely
23. Parallelizing Deep Belief Networks
Two phase training
Pre Train
Fine tune
Each phase can do multiple passes over dataset
Entire network is averaged at master
24. PreTrain and Lots of Data
We’re exploring how to better leverage the
unsupervised aspects of the PreTrain phase of
Deep Belief Networks
Allows for the use of far less unlabeled data
Allows us to more easily modeled the massive amounts
of structured data in HDFS
25.
26. DBNs on IR Performance
Faster to Train.
Parameter averaging is an automatic form of
regularization.
Adagrad with IR allows for better generalization of
different features and even pacing.
27. Scale Out Metrics
Batches of records can be processed by as many
workers as there are data splits
Message passing overhead is minimal
Exhibits linear scaling
Example: 3x workers, 3x faster learning
28. Usage From Command Line
Run Deep Learning on Hadoop
yarn jar iterativereduce-0.1-SNAPSHOT.jar [props file]
Evaluate model
./score_model.sh [props file]
33. References
“A Fast Learning Algorithm for Deep Belief Nets”
Hinton, G. E., Osindero, S. and Teh, Y. - Neural Computation
(2006)
“Large Scale Distributed Deep Networks”
Dean, Corrado, Monga - NIPS (2012)
“Visually Debugging Restricted Boltzmann Machine Training
with a 3D Example”
Yosinski, Lipson - Representation Learning Workshop (2012)
Notas do Editor
Bottou similar to Xu2010 in the 2010 paper
Benefits of data flow: runtime can decide where to run tasks and can automatically recover from failures
Acyclic data flow is a powerful abstraction, but is not efficient for applications that repeatedly reuse a working set of data:
Iterative algorithms (many in machine learning)
• No single programming model or framework can excel at
every problem; there are always tradeoffs between simplicity, expressivity, fault tolerance, performance, etc.
POLR: Parallel Online Logistic Regression
Talking points:
wanted to start with a known tool to the hadoop community, with expected characteristics
Mahout’s SGD is well known, and so we used that as a base point