2. Outline
• Background
• Problem Definition
• Unsupervised Feature Learning
• Our Work
• Sparse Auto-encoder
• Preprocessing: PCA and Whitening
• Self-Taught Learning and Unsupervised Feature Learning
• References
2 of 37
3. Background
•
Machine learning is one of the corner stone fields in Artificial Intelligence, where machines learn to act
autonomously, and react to new situations without being pre-programmed.
•
Machine learning has seen numerous successes, but applying learning algorithms today often means
spending a long time hand-engineering the input feature representation. This is true for many problems in
vision, audio, NLP, robotics, and other areas.
•
There are many learning algorithms for learning among them are [1]:
1)
Supervised learning
2)
Unsupervised learning
3 of 37
4. Problem Definition
•
The target of the supervised learning method can be summarized as follows:
•
•
•
Regression
Classification
The first step to train a machine using the supervised learning method, is collecting the data set, which in most cases
is a very difficult and an expensive process
•
The alternative approach is to measure and use everything, which will lead to other problems, i.e. the noisy data [2]
4 of 37
5. Unsupervised feature learning
•
The unsupervised feature learning approach learns higher-level representation of the unlabeled data
features by detecting patterns using various algorithms, i.e. sparse encoding algorithm [3]
•
It is a self-taught learning framework developed to transfer knowledge from unlabeled data, which is much
easier to obtain, to be used as preprocessing step to enhance the supervised inductive models.
•
This framework is developed to tackle present issues in the supervised learning model and to increase its
accuracy regardless of the domain of interest (vision, sound, and text).[4]
5 of 37
6. Our Work
•
We will present some of the methods for unsupervised feature learning and deep learning, each of which
automatically learns a good representation of the input from unlabeled data.
•
We will be concentrating on the following algorithms, with more details in the following slides:
•
•
PCA and Whitening
•
•
Sparse Autoencoder
Self-Taught
We will also be focusing on the application of these algorithms to learn features from images
6 of 37
9. Neural Network
Before we get further into the details of the algorithm, we need to quickly go through neural network.
To describe neural networks, we will begin by describing the simplest possible neural network. One that comprises
a single "neuron." We will use the following diagram to denote a single neuron [5]
Single Neuron [8]
9 of 37
13. Neural Network Model
•
A neural network is put together by hooking together many of our simple "neurons," so that the output of a
neuron can be the input of another. For example, here is a small neural network
•
The circles labeled "+1" are called bias units, and correspond to the intercept term. The leftmost layer of the
network is called the input layer, and the rightmost layer the output layer .The middle layer of nodes is called
the hidden layer, because its values are not observed in the training set.[8]
Small Neural Network[8]
13 of 37
20. Autoencoder Implementation
•
We implemented a sparse autoencoder, trained with 8×8 image patches using the L-BFGS optimization algorithm
Step 1: Generate training set
The first step is to generate a training set.
A random sample of 200 patches from the dataset.
20 of 37
21. Autoencoder Implementation
Step 2: Sparse autoencoder objective
Compute the sparse autoencoder cost function Jsparse(W,b) and the corresponding derivatives of Jsparse with respect
to the different parameters
Step3: Train the sparse autoencoder
After computing Jsparse and its derivatives, we will minimize Jsparse with respect to its parameters, and thereby train our
sparse autoencoder. We trained our sparse encoder with L-BFGS algorithm Our neural network for training has 64
input units, 25 hidden units, and 64 output units.
21 of 37
22. Autoencoder Implementation Results
After training the sparse autoencoder, the sparse autoencoder
successfully learned a set of edge detectors.
CPU
Intel corei7 Quad Core processor
2.7GHz
RAM
6 GB RAM
Training Set
200 patches 8x8 images
Neural Network for training
64 input units, 25 hidden units, and
64 output units.
22 of 37
25. Principle Component Analysis – PCA
•
PCA is a dimensionality reduction mechanism used to eliminate highly correlated variables, without
sacrificing much of the details.[7]
25 of 37
26. PCA – Example
Example
•
Given the 2D data example.
•
This data has already been pre-processed using mean normalization.
•
We want to find the principle directions of variation.
2D data example[8]
26 of 37
27. PCA – Example (Cont’d)
u2
u1
2D data example[8]
27 of 37
34. Self-Taught learning and Unsupervised feature
learning
Given an unlabeled data set, we can start training a sparse autoencoder to extract
features to give us a better, condense representation of the data.
Neural Network[8]
34 of 37
35. Self-Taught learning and Unsupervised feature
learning
• Once the training is done, the network is now ready to find better features to represent
the input using the activations of the network hidden layer. [8]
Input layer of Neural Network[8]
35 of 37
36. Self-Taught learning and Unsupervised feature
learning
Input layer of Neural Network[8]
36 of 37
37. Self-Taught learning and Unsupervised feature
learning
Input layer of Neural Network[8]
37 of 37
38. Self-Taught Learning Application
• We used the self-taught learning paradigm with the sparse autoencoder and softmax
classifier to build a classifier for handwritten digits.
• The goal is to distinguish between the digits from 0 to 4. We will use the digits 5 to 9 as
our "unlabeled" dataset; we will then use a labeled dataset with the digits 0 to 4 with
which to train the softmax classifier.
38 of 37
39. Self-Taught Learning Implementation
Step 1: Generate the input and test data sets
We used the datasets from the MNIST Handwritten Digit Database for this project.
Step 2: Train the sparse autoencoder
We used the unlabeled data (the digits from 5 to 9) to train a sparse autoencoder. These results are shown after training is
complete for a visualization of pen strokes like the image shown to the right
Step 3: Extracting features
After the sparse autoencoder is trained, we will use it to extract features from the handwritten digit images.
Step 4: Training and testing the logistic regression model
We will train a softmax classifier using the training set features and labels and finally computing the predictions and accuracy
39 of 37
40. Self-Taught Learning Setup Environment
CPU
Intel
corei7
Quad
processor 2.7GHz
Core
RAM
6 GB RAM
Training Set
60,000 examples from MNIST
database
Unlabeled set
29404 examples
Supervised training set
15298 examples
Supervised testing set
15298 examples
40 of 37
41. Self-Taught Learning Results
The results are shown below after training is complete for a visualization of pen strokes like the image shown below:
41 of 37
42. Self-Taught Learning Anaylsis
We have done a comparison between our
application outputs and the Stanford course
tutorial outputs [8].
Our classifier
Tutorial’s
classifier
Training
Time
16 minutes
25 minutes
Classifier
Score
(Accuracy)
98.208916%
98 %
42 of 37
43. Future Work
We propose that if we were able to parallize our code or make the training part run on a GPU for example, it will
boost the performance and decrease the time needed to train the classifier
43 of 37
44. References
[1] Taiwo Oladipupo Ayodele. New Advances in Machine Learning. InTech, 2010.
[2] SB Kotsiantis, ID Zaharakis, and PE Pintelas. Supervised machine learning: A review of classication techniques. 31:249-268, 2007.
[3] Honglak Lee, Alexis Battle, Rajat Raina, and Andrew Ng. Ecient sparse coding algorithms. In Advances in neural information processing systems, pages 801-808,2006.
[4] Bruno A Olshausen et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images.Nature, 381(6583):607-609, 1996.
[5] Simon O. Haykin, ”Multilayer Perceptron,” in Neural Networks and Learning Machines, 3rd Edition ed. , Prentice Hall, 2009.
[6] Andrew Ng. CS294A . Lecture notes, Topic : “Sparse autoencoder ” Standford University, Jan 11, 2011. Available:
http://www.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf. [Accessed Dec. 10,2013].
[7] Aapo Hyvärinen, Jarmo Hurri, and Patrik O. Hoyer, “Principal components and whitening,” in Natural Image Statistics: A Probabilistic Approach to Early
Computational Vision., Vol. 39, Springer-Verlag, 2009,pp. 97-137
[8] Andrew Ng, Jiquan Ngiam, Chuan Yu Foo, Yifan Mai, and Caroline Suen, “UFLDL Tutorial”, April 7, 2013. [Online]. Available:
http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial. [Accessed Dec. 10,2013].
44 of 37