1. Image Classification with
Deep Neural Networks
February 6, 2016
ImageNet Classification with Deep Convolutional Neural Networks
A. Krizhevsky, I. Sutskever, G. Hinton
Papers We Love
Classifies data into one of discrete classes
Eg. Classifying digits
Cost Function for Classification Task may be Logistic Regression or Log-likelihood
Predicts continuous real valued output
Eg. Stock Price Predictions
Cost function for regression type problem are MSE(Mean Squared Error)
• Input Image is array of
number for computer
• Assign a label to input image
from set of categories
• One of core problems in
15. Learning a Classifier
Gradient Descent Algorithm
Calculate Cost Function or Lost Function J(s)
Calculate Gradient 𝜕𝐽(w)/𝜕w
Stochastic Gradient Descent: Updates Adjust after example.
Mini-batch SGD: Updates after batches.
We do not use Squared Error measure for training instead we use Softmax
Function for output
Right cost function is Negative Log Likelihood.
16. Learning a Classifier- Negative Log
𝑁𝐿𝐿 𝜃, 𝒟 = −
log 𝑃(𝑌 = 𝑦(𝑖)|𝑥 𝑖 , 𝜃)
Where 𝒟 is Dataset
𝜃 is weight parameter
, 𝑦 𝑖
) is ith training data. Y is target data.
When the number of training
examples are small and the
architecture is deep, the network
performs well on training data but
worse on test data. i.e. it overfits.
18. Overfitting Mitigation
Data Augmentation :
Artificially creating more data samples from existing data through various
transformation of images (i.e. rotation, reflection, skewing etc.) and/or dividing
images into small patches and averaging all their predictions.
Applying PCA to the training examples to find out principal components which
correspond to intensity and color of the illumination. Creating artificial data by
adding randomly scaled eigen vectors to the training examples.
𝐼𝑥𝑦 𝑅, 𝐼𝑥𝑦 𝐺, 𝐼𝑥𝑦 𝐵 𝑇 = 𝒑𝟏, 𝒑𝟐, 𝒑𝟑 [𝛼1 𝜆1, 𝛼2 𝜆2, 𝛼3 𝜆3] 𝑇
Technique to reduce overfitting
Dropout prevents complex co-adaptation on the
Randomly omit each hidden unit with probability
Its like randomly sampling from 2^H
Architectures, H is number of units in a hidden
Efficient way to average many larger neural nets
20. ImageNet classification with Deep CNN
Improvement increases with larger datasets
Need model with large learning capacity
CNN’s capacity can be controlled with depth and breadth
Best results in ILSVRC-2010, 2012
22. Training on Multiple GPUs
Current GPU more suited to cross-GPU Parallelization
Putting half of neurons on each GPUs and allowing them to communicate only in
Choosing the connectivity is done by cross validation
ILSVRC started as part of Pascal Visual Object challenge on 2010
1.2 Million training images, 50K validation Images and 150K testing images.
ILSVRC uses 1000 Images for each of 1000 categories.
Two error measures top-1 error, top-5 error.
Top-5 error fraction of test images for which correct label is not among the best 5
results predicted from the model.
28. References A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural
 S. Tara, Brian Kingsbury, A.-r. Mohamed and B. Ramabhadran, "Learning Filter Banks within a Deep Neural
Network Framework," in IEEE, 2013.
 A. Graves, A.-r. Mohamed and G. Hinton, "Speech Recognition with Deep Recurrent Neural Networks,"
University of Toronto.
 A. Graves, "Generating Sequences with Recurrent Neural Networks," arXiv, 2014.
 Q. V. Oriol Vinyals, "A Neural Conversational Model," arXiv, 2015.
 R. Grishick, J. Donahue, T. Darrel and J. Mallik, "Rich Features Hierarchies for accurate object detection and
semantic segmentation.," UC Berkeley.
 A. Karpathy, "CS231n Convolutional Neural Networks for Visual Recognition," Stanford University, [Online].
 I. Sutskever, "Training Recurrent Neural Networks," University of Toronto, 2013.
 "Convolutional Neural Networks (LeNet)," [Online]. Available: http://deeplearning.net/tutorial/lenet.html.
 C. Eugenio, A. Dundar, J. Jin and J. Bates, "An Analysis of the Connections Between Layers of Deep Neural
Networks," arXiv, 2013.
 M. D. Zeiler and F. Rob, "Visualizing and Understanding Convolutional Networks," arXiv, 2013.
 G. Hinton, N. Srivastava, A. Karpathy, I. Sutskever and R. Salakhutdinov, Improving Neural Networks
by preventing co-adaptation of feature detectors, Totonto: arXiv, 2012.
 L. Fie-Fie. and A. Karpathy, "Deep Visual Alignment for Generating Image Descriptions,"
Standford University, 2014.
 O. Vinyals, A. Toshev., S. Bengio and D. Erthan, "Show and Tell: A Neural Image Caption
Generator.," Google Inc., 2014.
 J. M. G. H. IIya Sutskever, "Generating Text with Recurrent Neural Networks," in 28th International
Conference on Machine Learning, Bellevue, 2011.
 "Theano," [Online]. Available: http://deeplearning.net/software/theano/index.html. [Accessed 27
 "What is GPU Computing ?," NVIDIA, [Online]. Available: http://www.nvidia.com/object/what-is-
gpu-computing.html. [Accessed 27 12 2015].
 "GeForce 820M|Specifications," NVIDIA, [Online]. Available:
http://www.geforce.com/hardware/notebook-gpus/geforce-820m/specifications. [Accessed 28
Notas do Editor
-> Computer sees image as array of numbers
Receives an input and transform along depth and produces as output
sub-regions that are sensitive to only part of entire image
Convolution patches runs over entire image
Result obtained is convolved and has less features