Deep Learning in theano

Deep Learning in Theano
Massimo Quadrana
PhD Student @ Politecnico di Milano
Research Intern @ Telefonica I+D
massimo.quadrana@polimi.it @mxqdr
Original slides are available here: https://goo.gl/VLYsnR

Before starting
OS: Linux / Mac OS (sorry Windows guys :) )
Required software:
python 2.7x, git, openblas
Optional software (for faster math and better packages/virtualenv support):
Anaconda (https://www.continuum.io/downloads)
Anaconda Intel MKL (free student licence) (https://www.continuum.
io/anaconda-academic-subscriptions-available)

Before starting
Open your terminal and create a new virtualenv
> virtualenv -p /usr/bin/python2.7 theano-env
Activate the virtualenv
> source theano-env/bin/activate
Install the Theano package with dependences
> pip install Theano
(To exit the the virtualenv)
> deactivate

Before starting
To check if your Theano env is correctly configured run the following
python -c 'import theano'
It should complete without errors

Before starting
Get the lab code here
> git clone https://github.com/mquad/DNN_Lab_UPF
Structure of the repo:
● exercise/: directory with the code for the lab (it won’t run)
● complete/: directory with the code completed with the missing parts (it should
run :-) )
● notebooks/: some jupyther notebooks to show you some cool stuff
If you spot any error, or you have any feature request, open a new issue. I’ll do my
best to maintain the repo up-to-date :-)

Outline
Image classification
● Logistic Regression
● “Modern” Multi-layer NN
● Convolutional Neural networks
Sequence Modeling
● Character Based RNN

Open your editor and write the following. Save it as example_mul.py, then run
python example_mul.py
Theano intro

The official documentation:
http://deeplearning.net/software/theano/index.html
Theano intro

MNIST Dataset
60000 grayscale images (28 x 28 pixels each)
10 classes
8
Inputs Computation Outputs
Model

Logistic Regression on MNIST
0.1
T.dot(X, W) + b
softmax(X)
0. 0.10. 0.0. 0.0. 0.10.7
Zero One Two Three Four Five Six Seven Eight Nine

Open exercise/logreg_raw.py
Many parts have already been coded for you (library import, data import and split,
evaluation)
Write the code for the Logistic Regression classifier

LogReg: input vars and model parameters
Shared variables in Theano maintain their state across functions.
Use them to store your model’s parameters.
If execute on GPU, shared variables are stored into the GPU memory for faster
access.

LogReg: model and cost function
Softmax: generalization of sigmoid over multiple classes
Predicted class: class with maximum expected probability

LogReg: model and cost function
Cross-entropy loss
y one-hot encoding of the correct class of input features x (y_i = 1 iif class of x is i)
here we keep y integer, and use indexing of y_hat to save computations
Note: average loss over the minibatch (the cost must be scalar)

LogReg: SGD
T.grad() does the automatic differentiation of the loss function
updates tells Theano how to update the model (shared) parameters (it can be a
list of tuples, a dict or OrderedDict)

LogReg: Training, Loss and Predictions

LogReg: Softmax
exp function can easily overflow: subtract by the maximum x value to get more
stable results (without any effects on correctness)

LogReg:
File logreg.py contains a cleaner version of the Logistic Regression classifier.
init(): defines model parameters
model(): defines our model
fit() and predict(): fits the model on training data and predict the class given new
data

0.1
T.dot(X, w)
softmax(X)
0. 0.10. 0.0. 0.0. 0.10.7
Test accuracy: ~92%

“Modern” multi-layer network
0.0
h0 = relu(T.dot(X, Wh0) + b0)
y = softmax(T.dot(h1, Wy) + by)
0. 0.10. 0.0. 0.0. 0.0.9
h1 = relu(T.dot(h0, Wh1) + b1)
Noise
Noise
Noise
(or augmentation)

Open and complete mlp.py. The missing parts are:
● init(): initialize the MLP parameters
● model(): define the model using dropouts
● dropout(): apply dropout to the input
● apply_momentum(): apply momentum over the given updates

0.0
h0 = relu(T.dot(X, Wh0) + b0)
y = softmax(T.dot(h1, Wy) + by)
0. 0.10. 0.0. 0.0. 0.0.9
h1 = relu(T.dot(h0, Wh1) + b1)
Noise
Noise
Noise
(or augmentation)
Test accuracy: ~98%

Convolutional Neural Networks
from deeplearning.net

CNNs in Theano
Open convnet.py and complete the following parts
● get_conv_output_shape(): compute the output shape of the convolutional
layer
● init(): complete the initialization of the convolutional filters
● model(): define entirely the cnn model
● adagrad(): define the update rules for adagrad
● rmsprop(): define the update rules for rmsprop (easy if you do adagrad first)

Dealing with Convolutions
Inputs have 3 dimensions:
width, height (spatial dimensions W)
and depth
Convolutions are
● local in width and height (receptive field F)
● full in depth

Dealing with Convolutions
Convolution hyper-parameters
● depth: number of neurons connected to the same input region
● stride: space btw depth columns in the spatial dimensions
● padding: how to treat borders (not covered in the examples)
The spatial size of the output volume is given by the formula
(W - F + 2P) / S + 1

Our CNN (variation of LeNet5)
INPUT, CONV(5,5)*, MAX POOL, CONV(5,5)*, MAX POOL, FC*
*The actual number of filters and Fully Connected layers is programmable

CNNs: get_conv_output_volume()
We don’t consider padding for simplicity

CNNs: init()
First CONV(5,5), MAX POOL

CNNs: init()
Analogously for the second CONV(5,5), MAX POOL

Convolutional Neural Networks
from deeplearning.net
Test accuracy: 99.5%

SGD/Adagrad/Rmsprop in training convnets

Recurrent Neural Networks
Open char_rnn/char_rnn_vanilla.py and complete the following
● init(): define and initialize the parameters of the Vanilla RNN
● model(): compute the updates of hidden states of the RNN
● model_sample(): compute the updates of the hidden state of the RNN after
only one step

RNN: model()
theano.scan() defines symbolic loops in Theano.
It has 4 main arguments (+ several additional ones):
● fn: function to be applied at every iteration
● sequences: variables scan has to iterate over (iteration is done over the first
dimension of each variable)
● outputs_info: initial state of the of the outputs computed recurrently
● non_sequences: list of additional arguments passed to fn
At each iteration, fn receives the parameters in the following order:
sequences (if any), outputs_info (if needed), non_sequences (if any)

RNN - LSTM
Under the complete/ folder you have the code for the LSTM version of char-rnn
● char_rnn_lstm.py: standard LSTM
● char_rnn_lstm_fast.py: fast LSTM, makes better usage of vectorized
operations (>2x faster)
● sampler.py: to sample from your RNN
EXERCISE: They differ from VanillaRNN in their init(), model(), model_sample()
and sampler() methods. Try to figure out how to pass from one model to the other.

Additional remarks
How to choose the optimal hyperparameters of my DNN?
● Grid search (overly expensive)
● Bayesian Optimization (effective but quite complex)
● Random search (cheap, effective and easy to implement)
Check out mlp_opt.py to run random
hyperparameter search for the MLP.
EXERCISE: Try with CNNs, RNNs

Additional Remarks (2)
Packages worth checking
● Built on-top of Theano: Lasagne, Keras
● Standalone packages: Caffe (Berkely), Tensorflow (Google), CNTK
(Microsoft)
Repositories
● gitxiv.com

Credits
The slides and code used in this lab were inspired by some great works done by
some great Deep Learning researchers
Alec Recford’s slides: “Introduction to Deep Learning with Python”, http://www.
slideshare.net/indicods/general-sequence-learning-with-recurrent-neural-
networks-for-next-ml
Andrey Karpathy’s blog post “The unreasonable effectiveness of Recurrent
Neural Networks”, http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Andrey Karpathy’s char-rnn repo, https://github.com/karpathy/char-rnn

Deep Learning in theano

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Learning in theano

Similar to Deep Learning in theano (20)

Recently uploaded

Recently uploaded (20)

Deep Learning in theano