Scalable Deep Learning Using Apache MXNet

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Deep Learning with MXNet workshop
Sunil Mallya
Solutions Architect, Deep Learning
smallya@amazon.com
@sunilmallya

Agenda
• Deep Learning motivation and basics
• Apache MXNet overview
• MXNet programing model deep dive
• Train our first neural network using MXNet

Deep Learning basics

Biological Neuron
slide from http://cs231n.stanford.edu/
Neural Network basics: http://cs231n.github.io/neural-networks-1/

Artificial Neuron
output
synaptic
weights
• Input
Vector of training data x
• Output
Linear function of inputs
• Nonlinearity
Transform output into desired range
of values, e.g. for classification we
need probabilities [0, 1]
• Training
Learn the weights w and bias b

• Activation functions governs behavior of
neurons.
• Transition of input is called forward propagation.
• Activations are the values passed on to the next
layer from each previous layer. These values are
the output of the activation function of each
artificial neuron.
• Some of the more popular activation functions
include:
• Linear
• Sigmoid
• Hiberbolic Tangant
• Relu
• Softmax
• Step function
Activation Functions

Deep Neural Network
hidden layers
The optimal size of the hidden
layer (number of neurons) is
usually between the size of the
input and size of the output
layers
Input layer
output

The “Learning” in Deep Learning
0.4 0.3
0.2 0.9
...
back propogation (gradient descent)
X1 != X
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
-
-
X
input
label
...
X1

Apache MXNet

Apache MXNet
Programmable Portable High Performance
Near linear scaling
across hundreds of GPUs
Highly efficient
models for mobile
and IoT
Simple syntax,
multiple languages
88% efficiency
on 256 GPUs
Resnet 1024 layer network
is ~4GB

Ideal
Inception v3
Resnet
Alexnet
88%
Efficiency
1 2 4 8 16 32 64 128 256
No. of GPUs
• Cloud formation with Deep Learning AMI
• 16x P2.16xlarge. Mounted on EFS
• Inception and Resnet: batch size 32, Alex net: batch
size 512
• ImageNet, 1.2M images,1K classes
• 152-layer ResNet, 5.4d on 4x K80s (1.2h per epoch),
0.22 top-1 error
Scaling with MXNet

http://bit.ly/deepami
Deep Learning any way you want on AWS
Tool for data scientists and developers
Setting up a DL system takes (install) time & skill
Keep packages up to date and compiled (MXNet, TensorFlow, Caffe, Torch,
Theano, Keras)
Anaconda, Jupyter, Python 2 and 3
NVIDIA Drivers for G2 and P2 instances
Intel MKL Drivers for all other instances (C4, M4, …)
Deep Learning AMIs

MXNet Programing model

import numpy as np
a = np.ones(10)
b = np.ones(10) * 2
c = b * a
• Straightforward and flexible.
• Take advantage of language
native features (loop,
condition, debugger)
• E.g. Numpy, Matlab, Torch, …
• Hard to optimize
PROS
CONS
d = c + 1c
Easy to tweak
with python codes
Imperative Programing

• More chances for optimization
• Cross different languages
• E.g. TensorFlow, Theano,
Caffe
• Less flexible
PROS
CONS
C can share memory with D
because C is deleted later
A = Variable('A')
B = Variable('B')
C = B * A
D = C + 1
f = compile(D)
d = f(A=np.ones(10),
B=np.ones(10)*2)
A B
1
+
X
Declarative Programing

IMPERATIVE
NDARRAY API
DECLARATIVE
SYMBOLIC
EXECUTOR
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> c += 1
>>> print(c)
>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, num_hidde
>>> net = mx.symbol.SoftmaxOutput(data=net)
>>> texec = mx.module.Module(net)
>>> texec.forward(data=c)
>>> texec.backward()
NDArray can be set
as input to the graph
MXNet: Mixed programming paradigm

Embed symbolic expressions into imperative programming
texec = mx.module.Module(net)
for batch in train_data:
texec.forward(batch)
texec.backward()
for param, grad in zip(texec.get_params(), texec.get_grads()):
param -= 0.2 * grad
MXNet: Mixed programming paradigm

Hands on Lab

https://github.com/dmlc/mxnet-
notebooks/blob/master/python/tutorials/linear-regression.ipynb
Linear Regression

Linear regression
train_data = np.array([[1,2],[3,4],[5,6],[3,2],[7,1],[6,9]])
Y = ax +b ; such that
the error is
minimized

Defining the Model
Variables: A variable is a placeholder for future data
X = mx.sym.Variable('data')
Y = mx.symbol.Variable('lin_reg_label')
Neural Network Layers: The layers of a network or any other type of model are also defined by
Symbols
fully_connected_layer = mx.sym.FullyConnected(data=X,
name='fc1', num_hidden = 1)
Output Symbols: Output symbols are MXNet's way of defining a loss
lro = mx.sym.LinearRegressionOutput(data=fully_connected_layer,
label=Y, name="lro”)

Layers: Fully Connected
Fully connected layer of a neural
network (without any activation being
applied), which in essence, is just a
linear regression on the input
attributes.
It takes the following parameters:
a. data: Input to the layer
b. num_hidden: # of hidden
dimensions, specifies the size of the
output of the layer

Layers: Linear Regression Output
Linear Regression Output: Output layers in MXNet aim at
implementing a loss function.
We apply an L2 loss (Least Square errors)
The parameters to this layer are:
a. data: Input to this layer (specify the symbol whose
output should be fed here)
b. Label: The training label against whom we will compare
the input to the layer for calculation of l2 loss

Defining the Model
model = mx.mod.Module(
symbol = lro ,
data_names=['data'],
label_names = ['lin_reg_label']# network structure
)
model.fit(train_iter, eval_iter,
optimizer_params={'learning_rate':0.01, 'momentum': 0.9},
num_epoch=1000,
batch_end_callback = mx.callback.Speedometer(batch_size,
2))

Visualizing Networks
mx.viz.plot_network(symbol=lro)

Inference
#Inference
model.predict(eval_iter).asnumpy()
#Evaluation
metric = mx.metric.MSE()
model.score(eval_iter, metric)
#Evaluation Data
eval_data = np.array([[7,2],[6,10],[12,2]])
eval_label = np.array([11.1,26.1,16.1]) #Adding 0.1 to each of the values
eval_iter = mx.io.NDArrayIter(eval_data, eval_label, batch_size,
shuffle=False)
model.score(eval_iter, metric)

MNIST Notebook
https://github.com/dmlc/mxnet-
notebooks/blob/master/python/tutorials/mnist.ipynb

NDArray Data Iterator
import mxnet as mx
def to4d(img):
return img.reshape(img.shape[0], 1, 28, 28).astype(np.float32)/255
batch_size = 100
train_iter = mx.io.NDArrayIter(to4d(train_img), train_lbl, batch_size, shuffle=True)
val_iter = mx.io.NDArrayIter(to4d(val_img), val_lbl, batch_size)
Batch of 4-D matrix with shape (batch_size, num_channels, width, height)
For the MNIST dataset, there is only one color channel, and both width
and height are 28

• Input Layer: This layer is how we get input data
(vectors) fed into our network. The number of neurons in
an input layer is typically the same number as the input
feature to the network.
• Hidden Layer: The weight values on the connections
between the layers are how an ANN encodes what it
learns. Hidden layers are crucial in learning non-linear
functions.
• Output Layer: Output layer represents predictions.
Output can be regression or classification.
• Connections Between Layers: In a feed-forward
network connections link a layer to the next layer of an
ANN. Each connection has a weight. The weights of
connections are encoding of the knowledge of the
network.
Neural Network basics: http://cs231n.github.io/neural-networks-1/
Feed Forward Network

Multilayer Perceptron
Y = WX +b

Model
model = mx.model.FeedForward(
symbol = mlp, # network structure
num_epoch = 10, # number of data passes for training
learning_rate = 0.1 # learning rate of SGD
)
model.fit(
X=train_iter, # training data
eval_data=val_iter, # validation data
batch_end_callback = mx.callback.Speedometer(batch_size,
200) # output progress for each 200 data batches
)

Predictions and Validation
# prediction on a single image
prob = model.predict(val_img[0:1].astype(np.float32)/255)[0]
# get the class with highest probablity
print 'Classified as %d with probability %f' % (prob.argmax(),
max(prob))
# Run the model on the validation setand calculate the score with
eval_metric.
valid_acc = model.score(val_iter)

Convolution Neural Network (CNN)
CNN arranges neurons in 3
dimensions (width, height, depth)
CNN Layers
Convolutional Layer
Pooling Layer
Activation
Fully-Connected Layer

Convolutions
More info: http://cs231n.github.io/convolutional-networks/

Running the model
model = mx.model.FeedForward(
ctx = mx.gpu(0), # use GPU 0 for training, others are same as
before
symbol = lenet,
num_epoch = 10,
learning_rate = 0.1)
model.fit(
X=train_iter,
eval_data=val_iter,
batch_end_callback = mx.callback.Speedometer(batch_size, 200)
)

Visualizing hidden layers
Layer
1
Layer
2
Layer
3

LeNet-5 Architecture
img src: http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf

Thank You
smallya@amazon.com
sunilmallya

Scalable Deep Learning Using Apache MXNet

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Scalable Deep Learning Using Apache MXNet

Semelhante a Scalable Deep Learning Using Apache MXNet (20)

Mais de Amazon Web Services

Mais de Amazon Web Services (20)

Último

Último (20)

Scalable Deep Learning Using Apache MXNet

Notas do Editor