SlideShare uma empresa Scribd logo
1 de 82
Baixar para ler offline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Building Deep Learning
Applications with Apache MXNet
and Gluon
V i k r a m M a d a n & C y r u s V a h i d
N o v e m b e r 2 8 , 2 0 1 7
M C L 3 1 0
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Apache MXNet and gluon
Deep Learning Fundamentals
Convolutional Neural Networks
Recurrent Neural Networks and LSTM
gluon API
B u i l d i n g D e e p L e a r n i n g A p p l i c a t i o n s w i t h
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Apache MXNet and gluon
Deep Learning Fundamentals
Convolutional Neural Networks
Recurrent Neural Networks and LSTM
gluon API
B u i l d i n g D e e p L e a r n i n g A p p l i c a t i o n s w i t h
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Goals of Machine Learning
Source: http://cs231n.github.io/neural-networks-1/
• Starting with unlabeled data
• Labeling the data as accurately as possible
• Providing a function that probabilistically approximates a prediction
based on input data as accurately as possible.
𝑓 𝑉 = 𝑌
𝑉 = 𝑣1, 𝑣2, … , 𝑣 𝑛
https://giphy.com/gifs/machine-learning-rfPnhoMtbeuwE
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Limitations of Statistical Models
• Not very good at dealing with high-
dimensional data.
• Missing structural relatedness.
• Not very good at picking out complicated
patterns.
• Require strong domain knowledge to
perform feature engineering
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Biological & Artificial Neuron
Source: http://cs231n.github.io/neural-networks-1/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Perceptron
I1 I2 B
O
w1 w2 w3
𝑓 𝑥𝑖, 𝑤𝑖 = Φ(𝑏 + Σ𝑖(𝑤𝑖. 𝑥𝑖))
Φ 𝑥 = ቊ
1, 𝑖𝑓 𝑥 ≥ 0.5
0, 𝑖𝑓 𝑥 < 0.5
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Perceptron
I1 I2 B
O
1 1 -1.5
𝑂1 = 1𝑥1 + 1𝑥1 + −1.5 = 0.5 ∴ Φ(𝑂1) = 1
𝐼1 = 𝐼2 = 𝐵1 = 1
𝑂1 = 1𝑥1 + 0𝑥1 + −1.5 = −0.5 ∴ Φ(𝑂1) = 0
𝐼2 = 0 ; 𝐼1 = 𝐵1 = 1
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Non-Linearity
P Q P ∧ Q P ⨁ Q
T T T T
T F F F
F T F F
F F F T
P
Q
x0
0 0
P
Q
x0
x 0
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Learning
hidden layersInput layer
output
Add Non Linearity to output of hidden layer
To transform output into continuous range
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The “Learning” in Deep Learning
0.4 0.3
0.2 0.9
...
backpropagation (gradient descent)
X1 != X
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
-
X
input
label
...
X1
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activation Functions
Add nonlinearity to a layer and
applied to the layer’s output
There are several options:
 Rectified Linear Unit (ReLU)
 Sigmoid
 Hyperbolic Tangent (tanh)
 Softplus
ReLU functions are the most
commonly used today
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Linear Algebra & Matrix Multiplication
Requirement
# of Columns in A
must equal
# of Rows in B
Output
# of Rows in A
# of Columns in B
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Matrix Multiplication with Neural Networks
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inputs: Preprocessing, Batches, Epochs
Preprocessing
 Random separation of data into
training, validation, and test sets
 Necessary to measuring the
accuracy of the model
Batch
 Amount of data propagated
through network at every iteration
 Enables faster optimization
through shorter iteration cycles
Epoch
 Complete pass through all the
training data
 Optimization will have multiple
epochs to reduce error rate
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inputs: Encoding MNIST data
https://www.tensorflow.org/get_started/mnist/beginners
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inputs: Encoding Pictures into Data
7 x 7 x 3 Matrix
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Inference
• Regression
• Binary Classification
• Classification
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Classification with the Softmax Function
Softmax converts the output layer into probabilities – necessary for classification
Softmax Function
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Loss Function
• It is an objective function that quantifies how
successful the model was in its predictions
• It is a measure of the difference between a neural net’s
prediction and the actual value – that is, the error
• Typically, we use Cross Entropy Loss, which adjusts the
plain loss calculation to mitigate learning slowdown
• Backpropagation is performed to calculate the error
contribution of each neuron after processing one batch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gradient Descent
Iteratively update parameters to get the most optimal value for the objective function
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gradient Descent
Iteratively update parameters to get the most optimal value for the objective function
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Stochastic Gradient Descent
Gradient Descent
A single iteration for the
parameter update runs through
ALL of the training data
Stochastic Gradient Descent,
A single iteration for the
parameter update runs through
a BATCH of the training data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optimizers
http://imgur.com/a/Hqolp
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Learning Rates
• Learning Rate: It is a real number
that decides how far to move down in
the direction of steepest gradient
• Online Learning: Weights are
updated at each step (slow to learn)
• Batch Learning: Weights are
updated after all training data is
processed (hard to optimize)
• Mini-Batch: Combination of both
when we break up the training set
into smaller batches and update the
weights after each mini-batch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Training and Validation Data
Best model
When only evaluating accuracy using the training set, we face the Overfitting issue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Overfitting, Regularization, and Dropout
Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from
overfitting”, JMLR 2014
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
mnist = mx.test_utils.get_mnist()
batch_size = 64
num_inputs = 784
num_outputs = 10
def transform(data, label):
return data.astype(np.float32)/255, label.astype(np.float32)
train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform), batch_size, shuffle=True)
test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform), batch_size, shuffle=False)
Load the Data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
num_hidden = 256
net = gluon.nn.Sequential()
with net.name_scope():
net.add(gluon.nn.Dense(num_hidden, activation="relu"))
net.add(gluon.nn.Dense(num_hidden, activation="relu"))
net.add(TestDense(128))
net.add(gluon.nn.Dense(10))
Define The Model
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
class TestDense(gluon.Block):
def __init__(self, num_dense, **kwargs):
super(TestDense, self).__init__(**kwargs)
self.num_dense = num_dense
self.new_dense = gluon.nn.Dense(num_dense, activation="relu")
def forward(self, x):
return self.new_dense(x)
Custom Dense Layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
Initialize the Model
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
l2loss = gluon.loss.L2Loss()
Define Loss Function
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': .01, ‘momentum’: .9})
Choose an Optimizer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
epochs = 10
for e in range(epochs):
for i, (data, label) in enumerate(train_data):
data = data.as_in_context(ctx).reshape((-1, 784))
label = label.as_in_context(ctx)
with autograd.record():
output = net(data)
loss = softmax_cross_entropy(output, label)
loss.backward()
trainer.step(data.shape[0])
Train the Model
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Apache MXNet and gluon
Deep Learning Fundamentals
Convolutional Neural Networks
Recurrent Neural Networks and LSTM
gluon API
B u i l d i n g D e e p L e a r n i n g A p p l i c a t i o n s w i t h
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Framing Problem and Spatial Correlation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MLP and Framing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MLP and Framing Problem
• Input Layer takes the raw array of data.
• Feature Extraction layers extract features through:
• The first layer performs several convolutions in parallel to produce a set of linear activations.
• In the second stage (detector), each linear activation is run through a nonlinear activation function, such as ReLU
• The third layer performs pooling on the output
• In the end fully-connected layers, reassemble the features into final output and apply Softmax to predict create a
probabilistic distribution.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Convolution Neural Networks (CNN)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Convolutions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Convolutions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pooling and Strides
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pooling Output
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Full Convolution Neural Network Structure
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
num_fc = 512
net = gluon.nn.Sequential()
with net.name_scope():
net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu'))
net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu'))
net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
# The Flatten layer collapses all axis, except the first one, into one axis.
net.add(gluon.nn.Flatten())
net.add(gluon.nn.Dense(num_fc, activation="relu"))
net.add(gluon.nn.Dense(num_outputs))
Build CNN in gluon
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Apache MXNet and gluon
Deep Learning Fundamentals
Convolutional Neural Networks
Recurrent Neural Networks and LSTM
gluon API
B u i l d i n g D e e p L e a r n i n g A p p l i c a t i o n s w i t h
Seine steuerliche Bewertung wäre günstiger gewesen, wenn seine
Steuererklärung ür das Geschäftsjahr 1927 früher überprüft worden wäre
His tax assessment would have been more favorable if his tax return for fiscal
year 1927 had been checked earlier.
Seine steuerliche Bewertung wäre günstiger gewesen, wenn seine
Steuererklärung für das Geschäftsjahr 1927 früher überprüft worden
wäre
His tax assessment would have been more favorable if his tax return for
fiscal year 1927 had been checked earlier.
Who?
Seine steuerliche Bewertung wäre günstiger gewesen, wenn seine
Steuererklärung ür das Geschäftsjahr 1927 früher überprüft worden wäre
His tax assessment would have been more favorable if his tax return for
fiscal year 1927 had been checked earlier.
Who?
Handling Long-Term Dependencies
• The Spanish King officially abdicated in … of his …, Felipe. Felipe will be
confirmed tomorrow as the new Spanish … .
Sequences and Long-Term Dependency
• The Spanish King officially abdicated in favour of his son, Felipe. Felipe
will be confirmed tomorrow as the new Spanish King .
Sequences and Long-Term Dependency
• Their duo was known as "Buck and Bubbles”.
Bubbles tapped along as Buck played on the stride piano and sang until
Bubbles was totally tapped out.
Sequences and Long-Term Dependency
• CNN is news organization
• CNN is short for Convolutional Neural Network
• In our quest to implement perfect NLP tools, we have developed state of
the art RNNs. Now we can use them to … .
Sequences and Long-Term Dependency
• CNN is news organization
• CNN is short for Convolutional Neural Network
• In our quest to implement perfect NLP tools, we have developed state of
the art RNNs. Now we can use them to wreck a nice beach. (Geoffrey
Hinton – Coursera)
Sequences and Long-Term Dependency
Computational Graphs
x y
z
x
Let x and y be matrices
and z the dot product;
𝑧 = 𝑥 ⋅ 𝑦
ො𝑦 = 𝑥 ⋅ 𝑤
𝑤𝑒𝑖𝑔ℎ𝑡 𝑑𝑒𝑐𝑎𝑦 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = 𝜆Σ𝑖 𝑤𝑖
2
x w
ො𝑦
dot
𝑢(1)
sqr
𝑢(2)
sum
𝜆
𝑢(3)
x
Dynamical Systems and Timesteps
• 𝑠 𝑡
= 𝑓 𝑠 𝑡 −1
; 𝜃 ; where 𝜃 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙 𝑝𝑎𝑟𝑎𝑚𝑠. (A)
For instance if t = 3 then:
𝑠(3)
= 𝑓 𝑠 2
; 𝜃 = 𝑓 𝑓 𝑠 1
; 𝜃 ; 𝜃
• 𝑠 𝑡
= 𝑓 𝑠 𝑡 −1
, 𝑥(𝑡)
; 𝜃 ; 𝑤ℎ𝑒𝑟𝑒 𝑥 𝑖𝑠 𝑒𝑥𝑡𝑒𝑟𝑛𝑎𝑙 𝑠𝑖𝑔𝑛𝑎𝑙 𝑠𝑢𝑐ℎ 𝑎𝑠 𝑖𝑛𝑝𝑢𝑡 𝑣𝑒𝑐𝑡𝑜𝑟 (B)
• ℎ 𝑡
= 𝑓 ℎ 𝑡 −1
, 𝑥(𝑡)
; 𝜃 ; 𝑤ℎ𝑒𝑟𝑒 ℎ 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑡𝑎𝑡𝑒 𝑜𝑓 ℎ𝑖𝑑𝑑𝑒𝑛 𝑛𝑜𝑑𝑒 (C)
x
h
Unfold
𝑥(𝑡−1)
ℎ(𝑡−1)ℎ(… )
𝑓
𝑥(𝑡)
𝑡(𝑡)
𝑓
𝑥(𝑡+1)
ℎ(𝑡+1)
𝑓
ℎ(… )
http://www.deeplearningbook.org/
RNN Unrolling Through Time
0 RNN
𝑥1 = 𝑒𝑚𝑏 ”𝑖”
t = 0
RNN
𝑥2 = 𝑒𝑚𝑏 ”𝑎𝑚”
t = 1
RNN
𝑥3 = 𝑒𝑚𝑏 ”𝑓𝑟𝑜𝑚”
t = 2
softmax
𝑦
RNN
𝑥4 = 𝑒𝑚𝑏 ”𝑒𝑎𝑟𝑡ℎ”
t = 3
X = I am from Earth
𝑝(“𝑖”)
t = 0
𝑝(“𝑎𝑚”|”𝑖”)
t = 1
𝑝(“𝑓𝑟𝑜𝑚”|”𝑎𝑚”, “𝑖”)
t = 2
𝑝(“𝑒𝑎𝑟𝑡ℎ”|“𝑓𝑟𝑜𝑚”, ”𝑎𝑚”, “𝑖”)
t = 3
ℎ 𝑡 = ቊ
𝑡𝑎𝑛ℎ(𝑊ℎ𝑥 𝑥 𝑡 + 𝑊ℎℎℎ 𝑡−1), 𝑡 ≥ 1,
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
RNN Unrolling Through Time
Training RNN Networks
• Unrolling in time makes the turns the training task into
a feed-forward network.
• Since RNN represents a dynamical system we carry
weight matrices along the time steps.
𝑑𝑙
𝑑ℎ0
𝑑𝑙
𝑑ℎ1
𝑑𝑙
𝑑ℎ2
𝑑𝑙
𝑑ℎ3
Vanishing & Exploding Gradients
• At each timestep, running the BP algorithms causes the gradients to get
smaller and smaller until negligible.
• Unless 𝑑ℎ 𝑡
𝑑ℎ 𝑡−1
= 1, it tends to either diminish (< 1) or explode (> 1)
𝑑𝑙
𝑑ℎ 𝑡
through
repeated participation in computation.
𝑑𝑙
𝑑ℎ0
= 𝑡𝑖𝑛𝑦
𝑑𝑙
𝑑ℎ1
= 𝑠𝑚𝑎𝑙𝑙
𝑑𝑙
𝑑ℎ2
= 𝑚𝑒𝑑𝑖𝑢𝑚
𝑑𝑙
𝑑ℎ3
= 𝑙𝑎𝑟𝑔𝑒
Long Short Term Memory (LSTM)
LSTM
• To resolve the vanishing gradient
problem we want to design a network
that ensures derivative of recurrent
function is exactly 1.
• The idea behind LSTM is to add a
gated memory cell c to hidden nodes
for which
𝑑𝑐 𝑡
𝑑𝑐 𝑡−1
is exactly 1.
• Since derivative of the recurrent
function applied to memory cell is
exactly one, the value inside the cell
does not suffer from vanishing
gradient.
Picture from Alex Grave PHD Thesis
LSTM Cell– The Anatomy
Linear self-loop
Forget gate (sigmoid) controls
The value of state self-loop
Output gate (sigmoid) controls
the output’s contribution
Input gate (non-linear) controls
the input
0
1
𝜎(𝑥) =
1
1 − 𝑒−𝑥
tanh(𝑥) =
2
1 + 𝑒−2𝑥
− 1
LSTM Cell– Input and Output Gates
• Gates either allow information to
pass or block it.
• Gate functions perform an affine
transform followed by the sigmoid
function, which squashes the input
between 0 and 1.
• The output of the sigmoid is then
used to perform a componentwise
multiplication. This results in the
“gating" effect:
• if 𝜎 ≃ 1 ⇒ 𝑔𝑎𝑡𝑒 𝑖𝑠 𝑜𝑝𝑒𝑛: it will have little effect on the input.
• if 𝜎 ≃ 0 ⇒ 𝑔𝑎𝑡𝑒 𝑖𝑠 𝑐𝑙𝑜𝑠𝑒𝑑: it will block the input.
LSTM; the math
𝑑𝑐𝑡
𝑑𝑐𝑡−1
=
𝑑(𝑖 𝑡 ⊙ 𝑢 𝑡 + 𝑐𝑡−1)
𝑑𝑐𝑡−1
= 0 + 1 = 1
This is the main goal of LSTM to avoid vanishing or exploding gradients
How Much to update?
Gate Functions
Calculating the Next Hidden State; used in probability
calculation of lang. model: 𝑝𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑊ℎ𝑠ℎ 𝑡 + 𝑏𝑠)
Adding Forget Gate
• In most common variant of LSTM, a forget gate is added, so that the
value of memory can be reset. We often would like to forget memory cell
values after making an inference and no longer need the long term
dependencies.
Often a large bias to prevent the network from forgetting everything.
Modulating ct with ft enables the network to easily
clear it memory when justified by setting f-gate to 0.
Handwriting Recognition by Alex Graves
Recurrent Neural Networks - use cases
Image Caption Sentiment Analysis Machine Translation Video LabelingImage Labeling
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
class RNNModel(gluon.Block):
"""A model with an encoder, recurrent layer, and a decoder.”""
def __init__(self, mode, vocab_size, num_embed, num_hidden, num_layers, dropout=0.5, tie_weights=False, **kwargs):
super(RNNModel, self).__init__(**kwargs)
with self.name_scope():
self.drop = nn.Dropout(dropout)
self.encoder = nn.Embedding(vocab_size, num_embed, weight_initializer = mx.init.Uniform(0.1))
...
elif mode == 'lstm':
self.rnn = rnn.LSTM(num_hidden, num_layers, dropout=dropout, input_size=num_embed)
...
if tie_weights:
self.decoder = nn.Dense(vocab_size, in_units = num_hidden, params = self.encoder.params)
else:
self.decoder = nn.Dense(vocab_size, in_units = num_hidden)
self.num_hidden = num_hidden
Defining the Network
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
def forward(self, inputs, hidden):
emb = self.drop(self.encoder(inputs))
output, hidden = self.rnn(emb, hidden)
output = self.drop(output)
decoded = self.decoder(output.reshape((-1, self.num_hidden)))
return decoded, hidden
Defining the Network
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Apache MXNet and gluon
Deep Learning Fundamentals
Convolutional Neural Networks
Recurrent Neural Networks and LSTM
gluon API
B u i l d i n g D e e p L e a r n i n g A p p l i c a t i o n s w i t h
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dynamic graphs – trades off flexibility for scalability and performance
90%
Similar
Input Data Neural Network Model & Algorithm Prediction
Approach Optimized for Flexibility
Sentence 1
Michael threw
the football in
front of the
player
Sentence 2
The ball was
thrown short of
the target by
Michael
Flexible
Flexible
Flexible
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges with Deep Learning
Training
Scalability
Flexibility
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gluon is a technical specification providing a programming
interface that: (1) simplifies development of deep learning models,
(2) provides greater flexibility in building neural networks, and (3)
offers high performance
Simplicity Flexibility Performance
Gluon API Specification
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gluon API is Available in Apache MXNet
Others…
Development of the Gluon API specification was
a collaboration between AWS and Microsoft
Available Now
Coming Soon
Gluon API is implemented in Apache
MXNet and soon will be in CNTK
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Advantages of the Gluon API
Simple, Easy-to-
Understand Code
Flexible, Imperative
Structure
Dynamic Graphs
High Performance
 Neural networks can be defined using simple, clear, concise code
 Plug-and-play neural network building blocks – including predefined layers,
optimizers, and initializers
 Eliminates rigidity of neural network model definition and brings together
the model with the training algorithm
 Intuitive, easy-to-debug, familiar code
 Neural networks can change in shape or size during the training process to
address advanced use cases where the size of data fed is variable
 Important area of innovation in Natural Language Processing (NLP)
 There is no sacrifice with respect to training speed
 When it is time to move from prototyping to production, easily cache neural
networks for high performance and a reduced memory footprint
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Simple, Easy-to-Understand Code
Use plug-and-play neural network building blocks, including
predefined layers, optimizers, and initializers
# First step is to initialize your model
net = gluon.nn.Sequential()
# Then, define your model architecture
with net.name_scope():
net.add(gluon.nn.Dense(256, activation="relu")) # 1st layer (256 nodes)
net.add(gluon.nn.Dense(256, activation="relu")) # 2nd hidden layer
net.add(gluon.nn.Dense(num_outputs)) # Output layer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Flexible, Imperative Structure
Prototype, build, and debug neural networks more easily with a
fully imperative interface
epochs = 10
for e in range(epochs):
for i, batch in enumerate(train_data):
with autograd.record(): # Start recording the derivatives
output = net(data) # the forward iteration
loss = softmax_cross_entropy(output, label)
loss.backward()
trainer.step(data.shape[0])
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dynamic Graphs
Build neural networks on the fly for use cases where neural
networks must change in size and shape during model training
def forward(self, F, inputs, tree):
children_outputs = [self.forward(F, inputs, child)
for child in tree.children]
#Recursively builds the neural network based on each input sentence’s
#syntactic structure during the model definition and training process
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
High Performance
Easily cache the neural network to achieve high performance
when speed becomes more important than flexibility
net = nn.HybridSequential()
with net.name_scope():
net.add(nn.Dense(256, activation="relu"))
net.add(nn.Dense(128, activation="relu"))
net.add(nn.Dense(2))
net.hybridize()
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
v i k m a d a n @ a m a z o n . c o m
c y r u s m v @ a m a z o n . c o m

Mais conteúdo relacionado

Mais procurados

Optimizing EC2 for Fun and Profit #bigsavings #newfeatures - CMP202 - re:Inve...
Optimizing EC2 for Fun and Profit #bigsavings #newfeatures - CMP202 - re:Inve...Optimizing EC2 for Fun and Profit #bigsavings #newfeatures - CMP202 - re:Inve...
Optimizing EC2 for Fun and Profit #bigsavings #newfeatures - CMP202 - re:Inve...Amazon Web Services
 
ARC306_High Resiliency & Availability Of Online Entertainment Communities Usi...
ARC306_High Resiliency & Availability Of Online Entertainment Communities Usi...ARC306_High Resiliency & Availability Of Online Entertainment Communities Usi...
ARC306_High Resiliency & Availability Of Online Entertainment Communities Usi...Amazon Web Services
 
DAT341_Working with Amazon ElastiCache for Redis
DAT341_Working with Amazon ElastiCache for RedisDAT341_Working with Amazon ElastiCache for Redis
DAT341_Working with Amazon ElastiCache for RedisAmazon Web Services
 
MCL302_Maximizing the Customer Experience with AI on AWS
MCL302_Maximizing the Customer Experience with AI on AWSMCL302_Maximizing the Customer Experience with AI on AWS
MCL302_Maximizing the Customer Experience with AI on AWSAmazon Web Services
 
ARC319_Multi-Region Active-Active Architecture
ARC319_Multi-Region Active-Active ArchitectureARC319_Multi-Region Active-Active Architecture
ARC319_Multi-Region Active-Active ArchitectureAmazon Web Services
 
MBL204_Architecting Cost-Effective Mobile Backends for Scale, Security, and P...
MBL204_Architecting Cost-Effective Mobile Backends for Scale, Security, and P...MBL204_Architecting Cost-Effective Mobile Backends for Scale, Security, and P...
MBL204_Architecting Cost-Effective Mobile Backends for Scale, Security, and P...Amazon Web Services
 
STG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data WorkloadsSTG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data WorkloadsAmazon Web Services
 
DAT320_Moving a Galaxy into Cloud
DAT320_Moving a Galaxy into CloudDAT320_Moving a Galaxy into Cloud
DAT320_Moving a Galaxy into CloudAmazon Web Services
 
SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...Amazon Web Services
 
DEV207_Deploying and Managing Ruby Applications on AWS
DEV207_Deploying and Managing Ruby Applications on AWSDEV207_Deploying and Managing Ruby Applications on AWS
DEV207_Deploying and Managing Ruby Applications on AWSAmazon Web Services
 
DEV329_Cisco’s Journey from Monolith to Microservices
DEV329_Cisco’s Journey from Monolith to MicroservicesDEV329_Cisco’s Journey from Monolith to Microservices
DEV329_Cisco’s Journey from Monolith to MicroservicesAmazon Web Services
 
CTD201_Introduction to Amazon CloudFront and AWS Lambda@Edge
CTD201_Introduction to Amazon CloudFront and AWS Lambda@EdgeCTD201_Introduction to Amazon CloudFront and AWS Lambda@Edge
CTD201_Introduction to Amazon CloudFront and AWS Lambda@EdgeAmazon Web Services
 
CMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot Fleet
CMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot FleetCMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot Fleet
CMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot FleetAmazon Web Services
 
ARC331_How I Made My Motorbike Talk
ARC331_How I Made My Motorbike TalkARC331_How I Made My Motorbike Talk
ARC331_How I Made My Motorbike TalkAmazon Web Services
 
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017Amazon Web Services
 
NET304_Deep Dive into the New Network Load Balancer
NET304_Deep Dive into the New Network Load BalancerNET304_Deep Dive into the New Network Load Balancer
NET304_Deep Dive into the New Network Load BalancerAmazon Web Services
 
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...Amazon Web Services
 
Optimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWSOptimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWSAmazon Web Services
 
DAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the CloudDAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the CloudAmazon Web Services
 
STG305_Deep Dive on Backup to the AWS Cloud
STG305_Deep Dive on Backup to the AWS CloudSTG305_Deep Dive on Backup to the AWS Cloud
STG305_Deep Dive on Backup to the AWS CloudAmazon Web Services
 

Mais procurados (20)

Optimizing EC2 for Fun and Profit #bigsavings #newfeatures - CMP202 - re:Inve...
Optimizing EC2 for Fun and Profit #bigsavings #newfeatures - CMP202 - re:Inve...Optimizing EC2 for Fun and Profit #bigsavings #newfeatures - CMP202 - re:Inve...
Optimizing EC2 for Fun and Profit #bigsavings #newfeatures - CMP202 - re:Inve...
 
ARC306_High Resiliency & Availability Of Online Entertainment Communities Usi...
ARC306_High Resiliency & Availability Of Online Entertainment Communities Usi...ARC306_High Resiliency & Availability Of Online Entertainment Communities Usi...
ARC306_High Resiliency & Availability Of Online Entertainment Communities Usi...
 
DAT341_Working with Amazon ElastiCache for Redis
DAT341_Working with Amazon ElastiCache for RedisDAT341_Working with Amazon ElastiCache for Redis
DAT341_Working with Amazon ElastiCache for Redis
 
MCL302_Maximizing the Customer Experience with AI on AWS
MCL302_Maximizing the Customer Experience with AI on AWSMCL302_Maximizing the Customer Experience with AI on AWS
MCL302_Maximizing the Customer Experience with AI on AWS
 
ARC319_Multi-Region Active-Active Architecture
ARC319_Multi-Region Active-Active ArchitectureARC319_Multi-Region Active-Active Architecture
ARC319_Multi-Region Active-Active Architecture
 
MBL204_Architecting Cost-Effective Mobile Backends for Scale, Security, and P...
MBL204_Architecting Cost-Effective Mobile Backends for Scale, Security, and P...MBL204_Architecting Cost-Effective Mobile Backends for Scale, Security, and P...
MBL204_Architecting Cost-Effective Mobile Backends for Scale, Security, and P...
 
STG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data WorkloadsSTG316_Optimizing Storage for Big Data Workloads
STG316_Optimizing Storage for Big Data Workloads
 
DAT320_Moving a Galaxy into Cloud
DAT320_Moving a Galaxy into CloudDAT320_Moving a Galaxy into Cloud
DAT320_Moving a Galaxy into Cloud
 
SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...
 
DEV207_Deploying and Managing Ruby Applications on AWS
DEV207_Deploying and Managing Ruby Applications on AWSDEV207_Deploying and Managing Ruby Applications on AWS
DEV207_Deploying and Managing Ruby Applications on AWS
 
DEV329_Cisco’s Journey from Monolith to Microservices
DEV329_Cisco’s Journey from Monolith to MicroservicesDEV329_Cisco’s Journey from Monolith to Microservices
DEV329_Cisco’s Journey from Monolith to Microservices
 
CTD201_Introduction to Amazon CloudFront and AWS Lambda@Edge
CTD201_Introduction to Amazon CloudFront and AWS Lambda@EdgeCTD201_Introduction to Amazon CloudFront and AWS Lambda@Edge
CTD201_Introduction to Amazon CloudFront and AWS Lambda@Edge
 
CMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot Fleet
CMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot FleetCMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot Fleet
CMP316_Hedge Your Own Funds Run Monte Carlo Simulations on EC2 Spot Fleet
 
ARC331_How I Made My Motorbike Talk
ARC331_How I Made My Motorbike TalkARC331_How I Made My Motorbike Talk
ARC331_How I Made My Motorbike Talk
 
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
Reinforcement Learning – The Ultimate AI - ARC320 - re:Invent 2017
 
NET304_Deep Dive into the New Network Load Balancer
NET304_Deep Dive into the New Network Load BalancerNET304_Deep Dive into the New Network Load Balancer
NET304_Deep Dive into the New Network Load Balancer
 
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
 
Optimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWSOptimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWS
 
DAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the CloudDAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the Cloud
 
STG305_Deep Dive on Backup to the AWS Cloud
STG305_Deep Dive on Backup to the AWS CloudSTG305_Deep Dive on Backup to the AWS Cloud
STG305_Deep Dive on Backup to the AWS Cloud
 

Semelhante a MCL310_Building Deep Learning Applications with Apache MXNet and Gluon

Deep Learning Fundamentals
Deep Learning FundamentalsDeep Learning Fundamentals
Deep Learning FundamentalsThomas Delteil
 
Introduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNetIntroduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNetAmazon Web Services
 
Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...
Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...
Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...Amazon Web Services
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Amazon Web Services
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonAmazon Web Services
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Amazon Web Services
 
GPSBUS201-GPS Demystifying Artificial Intelligence
GPSBUS201-GPS Demystifying Artificial IntelligenceGPSBUS201-GPS Demystifying Artificial Intelligence
GPSBUS201-GPS Demystifying Artificial IntelligenceAmazon Web Services
 
Amazon SageMaker Algorithms: Machine Learning Week San Francisco
Amazon SageMaker Algorithms: Machine Learning Week San FranciscoAmazon SageMaker Algorithms: Machine Learning Week San Francisco
Amazon SageMaker Algorithms: Machine Learning Week San FranciscoAmazon Web Services
 
SageMaker Algorithms Infinitely Scalable Machine Learning
SageMaker Algorithms Infinitely Scalable Machine LearningSageMaker Algorithms Infinitely Scalable Machine Learning
SageMaker Algorithms Infinitely Scalable Machine LearningAmazon Web Services
 
Debugging and Performance tricks for MXNet Gluon
Debugging and Performance tricks for MXNet GluonDebugging and Performance tricks for MXNet Gluon
Debugging and Performance tricks for MXNet GluonApache MXNet
 
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...Amazon Web Services
 
Working with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingWorking with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingAmazon Web Services
 
Building Applications with Apache MXNet
Building Applications with Apache MXNetBuilding Applications with Apache MXNet
Building Applications with Apache MXNetApache MXNet
 
What is deep learning (and why you should care) - Talk at SJSU Oct 2018
What is deep learning (and why you should care) - Talk at SJSU Oct 2018What is deep learning (and why you should care) - Talk at SJSU Oct 2018
What is deep learning (and why you should care) - Talk at SJSU Oct 2018Hagay Lupesko
 
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...Amazon Web Services
 
Deep Learning for Developers: An Introduction, Featuring Samsung SDS (AIM301-...
Deep Learning for Developers: An Introduction, Featuring Samsung SDS (AIM301-...Deep Learning for Developers: An Introduction, Featuring Samsung SDS (AIM301-...
Deep Learning for Developers: An Introduction, Featuring Samsung SDS (AIM301-...Amazon Web Services
 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiAmazon Web Services
 
Time series modeling workd AMLD 2018 Lausanne
Time series modeling workd AMLD 2018 LausanneTime series modeling workd AMLD 2018 Lausanne
Time series modeling workd AMLD 2018 LausanneSunil Mallya
 

Semelhante a MCL310_Building Deep Learning Applications with Apache MXNet and Gluon (20)

Deep Learning Fundamentals
Deep Learning FundamentalsDeep Learning Fundamentals
Deep Learning Fundamentals
 
Introduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNetIntroduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNet
 
Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...
Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...
Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402...
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and Gluon
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
 
GPSBUS201-GPS Demystifying Artificial Intelligence
GPSBUS201-GPS Demystifying Artificial IntelligenceGPSBUS201-GPS Demystifying Artificial Intelligence
GPSBUS201-GPS Demystifying Artificial Intelligence
 
Amazon SageMaker Algorithms: Machine Learning Week San Francisco
Amazon SageMaker Algorithms: Machine Learning Week San FranciscoAmazon SageMaker Algorithms: Machine Learning Week San Francisco
Amazon SageMaker Algorithms: Machine Learning Week San Francisco
 
SageMaker Algorithms Infinitely Scalable Machine Learning
SageMaker Algorithms Infinitely Scalable Machine LearningSageMaker Algorithms Infinitely Scalable Machine Learning
SageMaker Algorithms Infinitely Scalable Machine Learning
 
Debugging and Performance tricks for MXNet Gluon
Debugging and Performance tricks for MXNet GluonDebugging and Performance tricks for MXNet Gluon
Debugging and Performance tricks for MXNet Gluon
 
Deep Learning with MXNet
Deep Learning with MXNetDeep Learning with MXNet
Deep Learning with MXNet
 
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
[REPEAT] Deep Learning for Developers: An Introduction, Featuring Samsung SDS...
 
Working with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model TrainingWorking with Amazon SageMaker Algorithms for Faster Model Training
Working with Amazon SageMaker Algorithms for Faster Model Training
 
Building Applications with Apache MXNet
Building Applications with Apache MXNetBuilding Applications with Apache MXNet
Building Applications with Apache MXNet
 
What is deep learning (and why you should care) - Talk at SJSU Oct 2018
What is deep learning (and why you should care) - Talk at SJSU Oct 2018What is deep learning (and why you should care) - Talk at SJSU Oct 2018
What is deep learning (and why you should care) - Talk at SJSU Oct 2018
 
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
 
Deep Learning for Developers: An Introduction, Featuring Samsung SDS (AIM301-...
Deep Learning for Developers: An Introduction, Featuring Samsung SDS (AIM301-...Deep Learning for Developers: An Introduction, Featuring Samsung SDS (AIM301-...
Deep Learning for Developers: An Introduction, Featuring Samsung SDS (AIM301-...
 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry Pi
 
Time series modeling workd AMLD 2018 Lausanne
Time series modeling workd AMLD 2018 LausanneTime series modeling workd AMLD 2018 Lausanne
Time series modeling workd AMLD 2018 Lausanne
 
GluonCV
GluonCVGluonCV
GluonCV
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

MCL310_Building Deep Learning Applications with Apache MXNet and Gluon

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT Building Deep Learning Applications with Apache MXNet and Gluon V i k r a m M a d a n & C y r u s V a h i d N o v e m b e r 2 8 , 2 0 1 7 M C L 3 1 0
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Apache MXNet and gluon Deep Learning Fundamentals Convolutional Neural Networks Recurrent Neural Networks and LSTM gluon API B u i l d i n g D e e p L e a r n i n g A p p l i c a t i o n s w i t h
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Apache MXNet and gluon Deep Learning Fundamentals Convolutional Neural Networks Recurrent Neural Networks and LSTM gluon API B u i l d i n g D e e p L e a r n i n g A p p l i c a t i o n s w i t h
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Goals of Machine Learning Source: http://cs231n.github.io/neural-networks-1/ • Starting with unlabeled data • Labeling the data as accurately as possible • Providing a function that probabilistically approximates a prediction based on input data as accurately as possible. 𝑓 𝑉 = 𝑌 𝑉 = 𝑣1, 𝑣2, … , 𝑣 𝑛 https://giphy.com/gifs/machine-learning-rfPnhoMtbeuwE
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Limitations of Statistical Models • Not very good at dealing with high- dimensional data. • Missing structural relatedness. • Not very good at picking out complicated patterns. • Require strong domain knowledge to perform feature engineering
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Biological & Artificial Neuron Source: http://cs231n.github.io/neural-networks-1/
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Perceptron I1 I2 B O w1 w2 w3 𝑓 𝑥𝑖, 𝑤𝑖 = Φ(𝑏 + Σ𝑖(𝑤𝑖. 𝑥𝑖)) Φ 𝑥 = ቊ 1, 𝑖𝑓 𝑥 ≥ 0.5 0, 𝑖𝑓 𝑥 < 0.5
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Perceptron I1 I2 B O 1 1 -1.5 𝑂1 = 1𝑥1 + 1𝑥1 + −1.5 = 0.5 ∴ Φ(𝑂1) = 1 𝐼1 = 𝐼2 = 𝐵1 = 1 𝑂1 = 1𝑥1 + 0𝑥1 + −1.5 = −0.5 ∴ Φ(𝑂1) = 0 𝐼2 = 0 ; 𝐼1 = 𝐵1 = 1
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Non-Linearity P Q P ∧ Q P ⨁ Q T T T T T F F F F T F F F F F T P Q x0 0 0 P Q x0 x 0
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deep Learning hidden layersInput layer output Add Non Linearity to output of hidden layer To transform output into continuous range
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The “Learning” in Deep Learning 0.4 0.3 0.2 0.9 ... backpropagation (gradient descent) X1 != X 0.4 ± 𝛿 0.3 ± 𝛿 new weights new weights 0 1 0 1 1 . . - X input label ... X1
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activation Functions Add nonlinearity to a layer and applied to the layer’s output There are several options:  Rectified Linear Unit (ReLU)  Sigmoid  Hyperbolic Tangent (tanh)  Softplus ReLU functions are the most commonly used today
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Linear Algebra & Matrix Multiplication Requirement # of Columns in A must equal # of Rows in B Output # of Rows in A # of Columns in B
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Matrix Multiplication with Neural Networks
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Preprocessing, Batches, Epochs Preprocessing  Random separation of data into training, validation, and test sets  Necessary to measuring the accuracy of the model Batch  Amount of data propagated through network at every iteration  Enables faster optimization through shorter iteration cycles Epoch  Complete pass through all the training data  Optimization will have multiple epochs to reduce error rate
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Encoding MNIST data https://www.tensorflow.org/get_started/mnist/beginners
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inputs: Encoding Pictures into Data 7 x 7 x 3 Matrix
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Inference • Regression • Binary Classification • Classification
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Classification with the Softmax Function Softmax converts the output layer into probabilities – necessary for classification Softmax Function
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Loss Function • It is an objective function that quantifies how successful the model was in its predictions • It is a measure of the difference between a neural net’s prediction and the actual value – that is, the error • Typically, we use Cross Entropy Loss, which adjusts the plain loss calculation to mitigate learning slowdown • Backpropagation is performed to calculate the error contribution of each neuron after processing one batch
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gradient Descent Iteratively update parameters to get the most optimal value for the objective function
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gradient Descent Iteratively update parameters to get the most optimal value for the objective function
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Stochastic Gradient Descent Gradient Descent A single iteration for the parameter update runs through ALL of the training data Stochastic Gradient Descent, A single iteration for the parameter update runs through a BATCH of the training data
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Optimizers http://imgur.com/a/Hqolp
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learning Rates • Learning Rate: It is a real number that decides how far to move down in the direction of steepest gradient • Online Learning: Weights are updated at each step (slow to learn) • Batch Learning: Weights are updated after all training data is processed (hard to optimize) • Mini-Batch: Combination of both when we break up the training set into smaller batches and update the weights after each mini-batch
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Training and Validation Data Best model When only evaluating accuracy using the training set, we face the Overfitting issue
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Overfitting, Regularization, and Dropout Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from overfitting”, JMLR 2014
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. mnist = mx.test_utils.get_mnist() batch_size = 64 num_inputs = 784 num_outputs = 10 def transform(data, label): return data.astype(np.float32)/255, label.astype(np.float32) train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform), batch_size, shuffle=True) test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform), batch_size, shuffle=False) Load the Data
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. num_hidden = 256 net = gluon.nn.Sequential() with net.name_scope(): net.add(gluon.nn.Dense(num_hidden, activation="relu")) net.add(gluon.nn.Dense(num_hidden, activation="relu")) net.add(TestDense(128)) net.add(gluon.nn.Dense(10)) Define The Model
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. class TestDense(gluon.Block): def __init__(self, num_dense, **kwargs): super(TestDense, self).__init__(**kwargs) self.num_dense = num_dense self.new_dense = gluon.nn.Dense(num_dense, activation="relu") def forward(self, x): return self.new_dense(x) Custom Dense Layer
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx) Initialize the Model
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss() l2loss = gluon.loss.L2Loss() Define Loss Function
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': .01, ‘momentum’: .9}) Choose an Optimizer
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. epochs = 10 for e in range(epochs): for i, (data, label) in enumerate(train_data): data = data.as_in_context(ctx).reshape((-1, 784)) label = label.as_in_context(ctx) with autograd.record(): output = net(data) loss = softmax_cross_entropy(output, label) loss.backward() trainer.step(data.shape[0]) Train the Model
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Apache MXNet and gluon Deep Learning Fundamentals Convolutional Neural Networks Recurrent Neural Networks and LSTM gluon API B u i l d i n g D e e p L e a r n i n g A p p l i c a t i o n s w i t h
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Framing Problem and Spatial Correlation
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MLP and Framing
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MLP and Framing Problem • Input Layer takes the raw array of data. • Feature Extraction layers extract features through: • The first layer performs several convolutions in parallel to produce a set of linear activations. • In the second stage (detector), each linear activation is run through a nonlinear activation function, such as ReLU • The third layer performs pooling on the output • In the end fully-connected layers, reassemble the features into final output and apply Softmax to predict create a probabilistic distribution.
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Convolution Neural Networks (CNN)
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Convolutions
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Convolutions
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pooling and Strides
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pooling Output
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Full Convolution Neural Network Structure
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. num_fc = 512 net = gluon.nn.Sequential() with net.name_scope(): net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu')) net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2)) net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu')) net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2)) # The Flatten layer collapses all axis, except the first one, into one axis. net.add(gluon.nn.Flatten()) net.add(gluon.nn.Dense(num_fc, activation="relu")) net.add(gluon.nn.Dense(num_outputs)) Build CNN in gluon
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Apache MXNet and gluon Deep Learning Fundamentals Convolutional Neural Networks Recurrent Neural Networks and LSTM gluon API B u i l d i n g D e e p L e a r n i n g A p p l i c a t i o n s w i t h
  • 47.
  • 48. Seine steuerliche Bewertung wäre günstiger gewesen, wenn seine Steuererklärung ür das Geschäftsjahr 1927 früher überprüft worden wäre His tax assessment would have been more favorable if his tax return for fiscal year 1927 had been checked earlier.
  • 49. Seine steuerliche Bewertung wäre günstiger gewesen, wenn seine Steuererklärung für das Geschäftsjahr 1927 früher überprüft worden wäre His tax assessment would have been more favorable if his tax return for fiscal year 1927 had been checked earlier. Who?
  • 50. Seine steuerliche Bewertung wäre günstiger gewesen, wenn seine Steuererklärung ür das Geschäftsjahr 1927 früher überprüft worden wäre His tax assessment would have been more favorable if his tax return for fiscal year 1927 had been checked earlier. Who? Handling Long-Term Dependencies
  • 51. • The Spanish King officially abdicated in … of his …, Felipe. Felipe will be confirmed tomorrow as the new Spanish … . Sequences and Long-Term Dependency
  • 52. • The Spanish King officially abdicated in favour of his son, Felipe. Felipe will be confirmed tomorrow as the new Spanish King . Sequences and Long-Term Dependency
  • 53. • Their duo was known as "Buck and Bubbles”. Bubbles tapped along as Buck played on the stride piano and sang until Bubbles was totally tapped out. Sequences and Long-Term Dependency
  • 54. • CNN is news organization • CNN is short for Convolutional Neural Network • In our quest to implement perfect NLP tools, we have developed state of the art RNNs. Now we can use them to … . Sequences and Long-Term Dependency
  • 55. • CNN is news organization • CNN is short for Convolutional Neural Network • In our quest to implement perfect NLP tools, we have developed state of the art RNNs. Now we can use them to wreck a nice beach. (Geoffrey Hinton – Coursera) Sequences and Long-Term Dependency
  • 56. Computational Graphs x y z x Let x and y be matrices and z the dot product; 𝑧 = 𝑥 ⋅ 𝑦 ො𝑦 = 𝑥 ⋅ 𝑤 𝑤𝑒𝑖𝑔ℎ𝑡 𝑑𝑒𝑐𝑎𝑦 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = 𝜆Σ𝑖 𝑤𝑖 2 x w ො𝑦 dot 𝑢(1) sqr 𝑢(2) sum 𝜆 𝑢(3) x
  • 57. Dynamical Systems and Timesteps • 𝑠 𝑡 = 𝑓 𝑠 𝑡 −1 ; 𝜃 ; where 𝜃 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙 𝑝𝑎𝑟𝑎𝑚𝑠. (A) For instance if t = 3 then: 𝑠(3) = 𝑓 𝑠 2 ; 𝜃 = 𝑓 𝑓 𝑠 1 ; 𝜃 ; 𝜃 • 𝑠 𝑡 = 𝑓 𝑠 𝑡 −1 , 𝑥(𝑡) ; 𝜃 ; 𝑤ℎ𝑒𝑟𝑒 𝑥 𝑖𝑠 𝑒𝑥𝑡𝑒𝑟𝑛𝑎𝑙 𝑠𝑖𝑔𝑛𝑎𝑙 𝑠𝑢𝑐ℎ 𝑎𝑠 𝑖𝑛𝑝𝑢𝑡 𝑣𝑒𝑐𝑡𝑜𝑟 (B) • ℎ 𝑡 = 𝑓 ℎ 𝑡 −1 , 𝑥(𝑡) ; 𝜃 ; 𝑤ℎ𝑒𝑟𝑒 ℎ 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑡𝑎𝑡𝑒 𝑜𝑓 ℎ𝑖𝑑𝑑𝑒𝑛 𝑛𝑜𝑑𝑒 (C) x h Unfold 𝑥(𝑡−1) ℎ(𝑡−1)ℎ(… ) 𝑓 𝑥(𝑡) 𝑡(𝑡) 𝑓 𝑥(𝑡+1) ℎ(𝑡+1) 𝑓 ℎ(… ) http://www.deeplearningbook.org/
  • 58. RNN Unrolling Through Time 0 RNN 𝑥1 = 𝑒𝑚𝑏 ”𝑖” t = 0 RNN 𝑥2 = 𝑒𝑚𝑏 ”𝑎𝑚” t = 1 RNN 𝑥3 = 𝑒𝑚𝑏 ”𝑓𝑟𝑜𝑚” t = 2 softmax 𝑦 RNN 𝑥4 = 𝑒𝑚𝑏 ”𝑒𝑎𝑟𝑡ℎ” t = 3 X = I am from Earth 𝑝(“𝑖”) t = 0 𝑝(“𝑎𝑚”|”𝑖”) t = 1 𝑝(“𝑓𝑟𝑜𝑚”|”𝑎𝑚”, “𝑖”) t = 2 𝑝(“𝑒𝑎𝑟𝑡ℎ”|“𝑓𝑟𝑜𝑚”, ”𝑎𝑚”, “𝑖”) t = 3 ℎ 𝑡 = ቊ 𝑡𝑎𝑛ℎ(𝑊ℎ𝑥 𝑥 𝑡 + 𝑊ℎℎℎ 𝑡−1), 𝑡 ≥ 1, 0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  • 60. Training RNN Networks • Unrolling in time makes the turns the training task into a feed-forward network. • Since RNN represents a dynamical system we carry weight matrices along the time steps. 𝑑𝑙 𝑑ℎ0 𝑑𝑙 𝑑ℎ1 𝑑𝑙 𝑑ℎ2 𝑑𝑙 𝑑ℎ3
  • 61. Vanishing & Exploding Gradients • At each timestep, running the BP algorithms causes the gradients to get smaller and smaller until negligible. • Unless 𝑑ℎ 𝑡 𝑑ℎ 𝑡−1 = 1, it tends to either diminish (< 1) or explode (> 1) 𝑑𝑙 𝑑ℎ 𝑡 through repeated participation in computation. 𝑑𝑙 𝑑ℎ0 = 𝑡𝑖𝑛𝑦 𝑑𝑙 𝑑ℎ1 = 𝑠𝑚𝑎𝑙𝑙 𝑑𝑙 𝑑ℎ2 = 𝑚𝑒𝑑𝑖𝑢𝑚 𝑑𝑙 𝑑ℎ3 = 𝑙𝑎𝑟𝑔𝑒
  • 62. Long Short Term Memory (LSTM)
  • 63. LSTM • To resolve the vanishing gradient problem we want to design a network that ensures derivative of recurrent function is exactly 1. • The idea behind LSTM is to add a gated memory cell c to hidden nodes for which 𝑑𝑐 𝑡 𝑑𝑐 𝑡−1 is exactly 1. • Since derivative of the recurrent function applied to memory cell is exactly one, the value inside the cell does not suffer from vanishing gradient. Picture from Alex Grave PHD Thesis
  • 64. LSTM Cell– The Anatomy Linear self-loop Forget gate (sigmoid) controls The value of state self-loop Output gate (sigmoid) controls the output’s contribution Input gate (non-linear) controls the input 0 1 𝜎(𝑥) = 1 1 − 𝑒−𝑥 tanh(𝑥) = 2 1 + 𝑒−2𝑥 − 1
  • 65. LSTM Cell– Input and Output Gates • Gates either allow information to pass or block it. • Gate functions perform an affine transform followed by the sigmoid function, which squashes the input between 0 and 1. • The output of the sigmoid is then used to perform a componentwise multiplication. This results in the “gating" effect: • if 𝜎 ≃ 1 ⇒ 𝑔𝑎𝑡𝑒 𝑖𝑠 𝑜𝑝𝑒𝑛: it will have little effect on the input. • if 𝜎 ≃ 0 ⇒ 𝑔𝑎𝑡𝑒 𝑖𝑠 𝑐𝑙𝑜𝑠𝑒𝑑: it will block the input.
  • 66. LSTM; the math 𝑑𝑐𝑡 𝑑𝑐𝑡−1 = 𝑑(𝑖 𝑡 ⊙ 𝑢 𝑡 + 𝑐𝑡−1) 𝑑𝑐𝑡−1 = 0 + 1 = 1 This is the main goal of LSTM to avoid vanishing or exploding gradients How Much to update? Gate Functions Calculating the Next Hidden State; used in probability calculation of lang. model: 𝑝𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑊ℎ𝑠ℎ 𝑡 + 𝑏𝑠)
  • 67. Adding Forget Gate • In most common variant of LSTM, a forget gate is added, so that the value of memory can be reset. We often would like to forget memory cell values after making an inference and no longer need the long term dependencies. Often a large bias to prevent the network from forgetting everything. Modulating ct with ft enables the network to easily clear it memory when justified by setting f-gate to 0.
  • 69. Recurrent Neural Networks - use cases Image Caption Sentiment Analysis Machine Translation Video LabelingImage Labeling
  • 70. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. class RNNModel(gluon.Block): """A model with an encoder, recurrent layer, and a decoder.”"" def __init__(self, mode, vocab_size, num_embed, num_hidden, num_layers, dropout=0.5, tie_weights=False, **kwargs): super(RNNModel, self).__init__(**kwargs) with self.name_scope(): self.drop = nn.Dropout(dropout) self.encoder = nn.Embedding(vocab_size, num_embed, weight_initializer = mx.init.Uniform(0.1)) ... elif mode == 'lstm': self.rnn = rnn.LSTM(num_hidden, num_layers, dropout=dropout, input_size=num_embed) ... if tie_weights: self.decoder = nn.Dense(vocab_size, in_units = num_hidden, params = self.encoder.params) else: self.decoder = nn.Dense(vocab_size, in_units = num_hidden) self.num_hidden = num_hidden Defining the Network
  • 71. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. def forward(self, inputs, hidden): emb = self.drop(self.encoder(inputs)) output, hidden = self.rnn(emb, hidden) output = self.drop(output) decoded = self.decoder(output.reshape((-1, self.num_hidden))) return decoded, hidden Defining the Network
  • 72. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Apache MXNet and gluon Deep Learning Fundamentals Convolutional Neural Networks Recurrent Neural Networks and LSTM gluon API B u i l d i n g D e e p L e a r n i n g A p p l i c a t i o n s w i t h
  • 73. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dynamic graphs – trades off flexibility for scalability and performance 90% Similar Input Data Neural Network Model & Algorithm Prediction Approach Optimized for Flexibility Sentence 1 Michael threw the football in front of the player Sentence 2 The ball was thrown short of the target by Michael Flexible Flexible Flexible
  • 74. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Challenges with Deep Learning Training Scalability Flexibility
  • 75. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gluon is a technical specification providing a programming interface that: (1) simplifies development of deep learning models, (2) provides greater flexibility in building neural networks, and (3) offers high performance Simplicity Flexibility Performance Gluon API Specification
  • 76. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gluon API is Available in Apache MXNet Others… Development of the Gluon API specification was a collaboration between AWS and Microsoft Available Now Coming Soon Gluon API is implemented in Apache MXNet and soon will be in CNTK
  • 77. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Advantages of the Gluon API Simple, Easy-to- Understand Code Flexible, Imperative Structure Dynamic Graphs High Performance  Neural networks can be defined using simple, clear, concise code  Plug-and-play neural network building blocks – including predefined layers, optimizers, and initializers  Eliminates rigidity of neural network model definition and brings together the model with the training algorithm  Intuitive, easy-to-debug, familiar code  Neural networks can change in shape or size during the training process to address advanced use cases where the size of data fed is variable  Important area of innovation in Natural Language Processing (NLP)  There is no sacrifice with respect to training speed  When it is time to move from prototyping to production, easily cache neural networks for high performance and a reduced memory footprint
  • 78. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Simple, Easy-to-Understand Code Use plug-and-play neural network building blocks, including predefined layers, optimizers, and initializers # First step is to initialize your model net = gluon.nn.Sequential() # Then, define your model architecture with net.name_scope(): net.add(gluon.nn.Dense(256, activation="relu")) # 1st layer (256 nodes) net.add(gluon.nn.Dense(256, activation="relu")) # 2nd hidden layer net.add(gluon.nn.Dense(num_outputs)) # Output layer
  • 79. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Flexible, Imperative Structure Prototype, build, and debug neural networks more easily with a fully imperative interface epochs = 10 for e in range(epochs): for i, batch in enumerate(train_data): with autograd.record(): # Start recording the derivatives output = net(data) # the forward iteration loss = softmax_cross_entropy(output, label) loss.backward() trainer.step(data.shape[0])
  • 80. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dynamic Graphs Build neural networks on the fly for use cases where neural networks must change in size and shape during model training def forward(self, F, inputs, tree): children_outputs = [self.forward(F, inputs, child) for child in tree.children] #Recursively builds the neural network based on each input sentence’s #syntactic structure during the model definition and training process
  • 81. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. High Performance Easily cache the neural network to achieve high performance when speed becomes more important than flexibility net = nn.HybridSequential() with net.name_scope(): net.add(nn.Dense(256, activation="relu")) net.add(nn.Dense(128, activation="relu")) net.add(nn.Dense(2)) net.hybridize()
  • 82. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! v i k m a d a n @ a m a z o n . c o m c y r u s m v @ a m a z o n . c o m