SlideShare uma empresa Scribd logo
1 de 85
Baixar para ler offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Build Deep Learning Applications
Using MXNet and Amazon SageMaker
Cyrus M. Vahid
Principal Evangelist at AWL AI Labs
Amazon Web Services – Deep Engine
A I M 4 1 8
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Machine learning
Deep learning
Multi-layer perceptron
Convolutional neural networks
Gluon
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Intelligence
Would you kindly tell me if you have the phone number of the queen?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Intelligence
[Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧ ~Φ) ∴ 𝐹
Would you kindly tell me if you have the phone number of the queen?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Intelligence
[Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧ ~Φ) ∴ 𝐹
Would you kindly tell me if you have the phone number of the queen?
The Spanish King officially abdicated in ... of his …,
Felipe. Felipe will be confirmed tomorrow as the new
Spanish ... .
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Intelligence
[Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧ ~Φ) ∴ 𝐹
Would you kindly tell me if you have the phone number of the queen?
The Spanish King officially abdicated in favour of his son,
Felipe. Felipe will be confirmed tomorrow as the new
Spanish King.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Intelligence
[Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧ ~Φ) ∴ 𝐹
Would you kindly tell me if you have the phone number of the queen?
In our quest to implement perfect NLP tools, we have
developed state of the art RNNs. Now we can use
them to …. (Jeoffy Hinton – Coursera)
The Spanish King officially abdicated in favour of his son,
Felipe. Felipe will be confirmed tomorrow as the new
Spanish King.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Intelligence
[Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧ ~Φ) ∴ 𝐹
Would you kindly tell me if you have the phone number of the queen?
The Spanish King officially abdicated in favour of his son,
Felipe. Felipe will be confirmed tomorrow as the new
Spanish King.
In our quest to implement perfect NLP tools, we have
developed state of the art RNNs. Now we can use
them to wreck a nice beach. (Jeoffy Hinton –
Coursera)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Biological learning
Source: http://cs231n.github.io/neural-networks-1/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Perceptron
I1 I2 B
O
w1 w2 w3
𝑓 𝑥𝑖, 𝑤𝑖 = Φ(𝑏 + Σ𝑖(𝑤𝑖. 𝑥𝑖))
Φ 𝑥 = ቊ
1, 𝑖𝑓 𝑥 ≥ 0.5
0, 𝑖𝑓 𝑥 < 0.5
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Perceptron
I1 I2 B
O
1 1 -1.5
𝑂1 = 1𝑥1 + 1𝑥1 + −1.5 = 0.5 ∴ Φ(𝑂1) = 1
𝐼1 = 𝐼2 = 𝐵1 = 1
𝑂1 = 1𝑥1 + 0𝑥1 + −1.5 = −0.5 ∴ Φ(𝑂1) = 0
𝐼2 = 0 ; 𝐼1 = 𝐵1 = 1
P Q P ∧ Q
T T T
T F F
F T F
F F F
P
Q
x0
0 0
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Non-linear space
P
Q
x0
0 0
P
Q
x0
x 0
P Q P ∧ Q P ⨁ Q
T T T T
T F F F
F T F F
F F F T
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deep learning
Hidden layers
Input layer
Output
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The “learning” in deep learning
0.4 0.3
0.2 0.9
...
backpropagation (gradient descent)
ො𝑦 != ො𝑦
0.4 ± 𝛿 0.3 ± 𝛿
new
weights
new
weights
0
1
0
1
1
.
.
.
X
input
label
...
ො𝑦
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Universal function approximation
• Let 𝜙 . 𝑏𝑒 𝑎 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡, 𝑏𝑜𝑢𝑛𝑑𝑒𝑑, 𝑎𝑛𝑑 𝑚𝑜𝑛𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
• 𝐿𝑒𝑡 𝐼 𝑚 𝑑𝑒𝑛𝑜𝑡𝑒 𝑡ℎ𝑒 𝑚 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑎𝑙 𝑢𝑛𝑖𝑡 ℎ𝑦𝑝𝑒𝑟𝑐𝑢𝑏𝑒 0,1 𝑚. 𝑇ℎ𝑒 𝑠𝑝𝑎𝑐𝑒 𝑜𝑓
𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 𝑜𝑛 𝐼 𝑚 𝑖𝑠 𝑑𝑒𝑛𝑜𝑡𝑒𝑑 𝑏𝑦 𝐶 𝐼 𝑚 .
• 𝑇ℎ𝑒𝑛, 𝑔𝑖𝑣𝑒𝑛 𝜖 > 0 𝑎𝑛𝑑 𝑎𝑛𝑦 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑓𝜖𝐶 𝐼 𝑚 , 𝑡ℎ𝑒𝑟𝑒 𝑒𝑥𝑖𝑠𝑡𝑠 𝑎𝑛 𝑖𝑛𝑡𝑒𝑔𝑒𝑟 𝑁,
𝑟𝑒𝑎𝑙 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝑣𝑖, 𝑏𝑖 𝜖ℝ 𝑎𝑛𝑑 𝑟𝑒𝑎𝑙 𝑣𝑒𝑐𝑡𝑜𝑟𝑠 𝑤𝑖 𝜖ℝ 𝑚
, 𝑤ℎ𝑒𝑟𝑒 𝑖 = 1, 2, … , 𝑁,
𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑤𝑒 𝑚𝑎𝑦 𝑑𝑒𝑓𝑖𝑛𝑒
𝐹 𝑥 = ෍
𝑖=1
𝑁
𝑣𝑖 𝜙(𝑤𝑖
𝑇
𝑥 + 𝑏𝑖 )
𝑎𝑠 𝑎𝑛 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 𝑟𝑒𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑜𝑓𝑐 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑓 𝑤ℎ𝑒𝑟𝑒 𝑖𝑠 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓 𝜙; 𝑡ℎ𝑎𝑡 𝑖𝑠
𝐹 𝑥 − 𝑓 𝑥 < 𝜖
𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 𝑖𝑛 𝐼 𝑚
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Activation functions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gradient descent
• After training over data we sill
have an error surface
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gradient descent
• After training over data we sill
have an error surface
• The goal of optimization is to
reach the minima of the
surface, and thus reducing error
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gradient descent
• Loss function, 𝐽, is a measure of
how well an algorithm models a
dataset
• There are several loss functions
and one can combine them.
Some of the more popular loss
functions are RMST, hinge, L1,
L2, …
• For more information please
check:
https://tinyurl.com/y7c6ub5k
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gradient descent
• Loss function, 𝐽, is a measure of
how well an algorithm models a
dataset
• Weights are adjusted in the
opposite direction of calculated
gradients
Learning rate
Gradient
𝛼
𝜕𝐽 𝜃
𝜕𝜃𝑗
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Non-convex error surface
• 𝑓: ℝ 𝑛
→ ℝ 𝑖𝑠 𝑐𝑜𝑛𝑣𝑒𝑥 𝑖𝑓 𝑎𝑛𝑑 𝑖𝑓
∀ 𝑥1, 𝑥2 𝜖ℝ 𝑛
, 𝑎𝑛𝑑 ∀𝜆𝜖 0,1 :
• 𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓(𝑥2)
• With a convex objective and a convex feasible
region, there can be only one optimal solution
(globally optimal)
• Non-convex optimization problem may
have multiple feasible regions and
multiple locally optimal points within
each region
• It can take time exponential to determine there is
no solution, an optimal solution exists, or objective
function is unbounded
Global Optimum
Global Optimum
Local Optimum
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Non-convex error surface
• In deep learning we almost
exclusively need to solve a
complex non-convex
optimization problem in an n-
dimensional vector space
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recap
• A neural network with at least one hidden layer can approximate any
function
• Training a network (backpropagation) consists of:
• Initializing weights at “random”
• Compute the network forward (forward pass)
• Reduce loss by updating weights in opposite direction of gradient of the loss function
• Repeat the process until an optimized set of weights are calculated
• The optimization is complicated and computationally very intensive
due to non-convexity of the optimization space
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Minibatch training
• Updating millions of weights at
each pass is inefficient (online)
• Updating weights at the end of
each run over all data is not
effective (batch)
• We use minibatch training to
capture the best of the two worlds
• Epoch is one forward and
backward pass on all data
• Batch size is the number of training
examples in one forward/backward
pass
• https://tinyurl.com/yc2l63lq
• https://tinyurl.com/yaof5axr
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learning rate adjustment
• If learning rate is too large will not
converge due to oscillation
• If learning rate is too small
convergence will take a very long
time
• In SToA is common to use a
learning rate scheduler. For more
information please refer to:
• https://tinyurl.com/y9mcfvjf
• https://tinyurl.com/ybxyncgs
https://tinyurl.com/qfp2kfq
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Saddle points
• When partial derivatives in respect
to all variables are zero we have a
saddle point
• 𝑓 𝑥, 𝑦 = 𝑥2
− 𝑦2
;
𝜕𝑓
𝜕𝑥
= 2x ,
𝜕𝑓
𝜕𝑦
= 2
• 𝑎𝑡 0,0
𝜕𝑓
𝜕𝑥
=
𝜕𝑓
𝜕𝑦
= 0
• 𝑓 𝑥, 0 = 𝑥2 ℎ𝑎𝑠 𝑎 𝑙𝑜𝑐𝑎𝑙 𝑚𝑖𝑛𝑖𝑚𝑎 𝑎𝑡 𝑥 = 0
• 𝑓 0, 𝑦 = −𝑦2 ℎ𝑎𝑠 𝑎 𝑙𝑜𝑐𝑎𝑙 𝑚𝑖𝑛𝑖𝑚𝑎 𝑎𝑡 𝑦 = 0
• This results in a stable minima at
(0,0)
• (0,0) as demonstrated in the
picture is not a global optimum
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Flavors of SGD
http://ruder.io/optimizing-gradient-descent/index.html#gradientdescentvariants
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Overfitting
• Overfitting happens when model
learns signal as well as noise in the
training data
• This prevents the model to
generalize well on unseen data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Overfitting
• Overfitting happens when model
learns signal as well as noise in the
training data
• This prevents the model to
generalize well on unseen data
• Overfitting can result from having
too few data points, noisy data, or
too large of a network for the
existing data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dropout and drop connect
Regular Dropout Drop connect
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dropout and drop connect
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Computational dependency/graph
• 𝑧 = 𝑥 ⋅ 𝑦
• 𝑘 = 𝑎 ⋅ 𝑏
• 𝑡 = 𝜆𝑧 + 𝑘
x y
𝑧
x
𝜆
𝑢
x
a
x
b
k
𝑡
+
1 1
2
3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Computational dependency/graph
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=10)
net = mx.sym.SoftmaxOutput(net, name='softmax')
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Computational dependency/graph
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=10)
net = mx.sym.SoftmaxOutput(net, name='softmax')
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Training
import logging
logging.getLogger().setLevel(logging.DEBUG) # logging to stdout
# create a trainable module on compute context
mlp_model = mx.mod.Module(symbol=mlp, context=ctx)
mlp_model.fit(train_iter,
eval_data=val_iter,
optimizer='sgd',
optimizer_params={'learning_rate':0.1},
eval_metric='acc',
batch_end_callback = mx.callback.Speedometer(batch_size, 100),
num_epoch=10)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Computational dependency/graph
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64)
net = mx.sym.Activation(net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(net, name='fc2', num_hidden=10)
net = mx.sym.SoftmaxOutput(net, name='softmax')
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Training efficiency—92%
https://mxnet.incubator.apache.org/tutorials/vision/large_scale_classification.html
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
End-to-end
machine learning
platform
Zero setup Flexible model
training
Pay by the second
$
Amazon SageMaker
Build, train, and deploy machine learning models at scale
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SageMaker and distributed training
• Faster training through Amazon SageMaker streaming for custom algorithms
• Boilerplate code for your algorithms to train over a cluster
PCA Bemchmark
if len(hosts) == 1:
kvstore = 'device' if num_gpus > 0 else 'local’
else:
kvstore = 'dist_device_sync' if num_gpus > 0 else 'dist_sync’
trainer = gluon.Trainer(net.collect_params(), 'sgd’,
{'learning_rate': learning_rate, 'momentum': momentum},
kvstore=kvstore)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Training code
• Matrix factorization
• Regression
• Principal component analysis
• K-means clustering
• Gradient boosted trees
• And more!
Amazon provided algorithms
Bring your own script (IM builds the container)
Bring your own algorithm (you build the container)
Fetch Training data
Save Model Artifacts
Fully
managed –
Secured
–
Amazon ECR
Save Inference Image
IM estimators in
Apache Spark
CPU GPU HPO
Distributed training
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automatic model tuning
Training code
• Factorization machine
• Regression/classification
• Principal component analysis
• K-means clustering
• XGBoost
• DeepAR
• And more
Amazon SageMaker built-in algorithms Bring your own script (prebuilt containers) Bring your own algorithm
Fetch Training data Save Model Artifacts
Fully
managed –
Secured–
Automatic model tuning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evolution of deep learning frameworks
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why Gluon?
Simple, easy-to-
understand code
Flexible, imperative
structure
Dynamic graphs High performance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
net = gluon.nn.HybridSequential()
with net.name_scope():
net.add(gluon.nn.Dense(units=64, activation='relu'))
net.add(gluon.nn.Dense(units=10))
Define the network
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx, force_reinit=True)
Initialize the model
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
Loss function
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Choose an optimizer
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.02})
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
mnist = mx.test_utils.get_mnist()
batch_size = 64
num_inputs = 784
num_outputs = 10
def transform(data, label):
return data.astype(np.float32)/255, label.astype(np.float32)
train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True,
transform=transform), batch_size, shuffle=True)
test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False,
transform=transform), batch_size, shuffle=False)
Load the data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
for e in range(10):
cumulative_loss = 0
for i, (data, label) in enumerate(train_data):
data = data.as_in_context(model_ctx).reshape((-1, 784))
label = label.as_in_context(model_ctx)
with autograd.record():
output = net(data)
loss = softmax_cross_entropy(output, label)
loss.backward()
trainer.step(data.shape[0])
Training
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Spatial relatedness
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Spatial relatedness
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Convolution*
• Convolution is a specialized kind of linear operation
• We use a reduction mechanism that is weighted
differently based on relevance
• Example: Measurement of location of a spaceship along its trajectory
creates a discrete set of measurements. Each one could be fuzzy, but
averaging them helps remove the noise, and have better prediction on
the current location with more weight given to the local position.
• 𝑥 is often called input (often multi-dimensional array of data) and w is
called kernel (often multi-dimensional array of parameters)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Convolution*
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Convolution*
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pooling
http://www.deeplearningbook.org/contents/convnets.
html
• A pooling function replaces the output
of the net at a certain location with a
summary statistic of the nearby
outputs
• Pooling helps detect existence of
features as opposed to detecting
where a feature is through making a
representation invariance to small
translation in the input
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pooling and strides
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Feature extraction
• Feature extraction layers extract features through:
• The first layer performs several convolutions in parallel to produce a set of
linear activations
• In the second stage (detector), each linear activation is run through a
nonlinear activation function, such as ReLU
• The performs pooling on the output
• In the end fully connected layers perform discrimination tasks
on the enriched data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Feature extraction
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Convolution neural networks (CNNs)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Convolutions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pooling output
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Full convolution neural network
structure
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
num_fc = 512
net = gluon.nn.Sequential()
with net.name_scope():
net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu'))
net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu'))
net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
# The Flatten layer collapses all axis, except the first one, into one axis.
net.add(gluon.nn.Flatten())
net.add(gluon.nn.Dense(num_fc, activation="relu"))
net.add(gluon.nn.Dense(num_outputs))
Gluon code
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What’s new—GluonCV
• A deep learning toolkit for computer vision
• Features
• Training scripts that reproduce SOTA results reported in latest papers
• A large set of pre-trained models
• Carefully designed APIs and easy to understand implementations
• Community support
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What’s new—GluonNLP
• A deep learning toolkit for natural language processing
• Features
• Training scripts to reproduce SOTA results reported in research papers
• Pre-trained models for common NLP tasks
• Carefully designed APIs that greatly reduce the implementation complexity
• Community support
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What’s new—Keras backend
Instance Type GPUs Batch Size
Keras-MXNet
(img/sec)
Keras-
TensorFlow
(img/sec)
C5.18X Large 0 32df 13 4
P3.8X Large 1 32 194 184
P3.8X Large 4 128 764 393
P3.16X Large 8 256 1068 261
Instance Type GPUs Batch Size
Keras-MXNet
(img/sec)
Keras-
TensorFlow
(img/sec)
C5.X Large 0 32 5.79 3.27
C5.8X Large 0 32 27.9 18.2
https://github.com/awslabs/keras-apache-mxnet/tree/master/benchmark
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What’s new—Sockeye
• A seq2seq toolkit based on MXNet
• Features
• Beam search inference
• Easy ensembling of multiple models
• Residual connections between RNN layers (Wu et al., 2016) [deep LSTM with parallelism]
• Lexical biasing of output layer predictions (Arthur et al., 2016) [low frequency words]
• Modeling coverage (Tu et al., 2016) [keeping attention history to reduce over and under translation]
• Context gating (Tu et al., 2017) [Improving adequacy of translation by controlling rations of source
and target context]
• Cross-entropy label smoothing (e.g., Pereyra et al., 2017)
• Layer normalization (Ba et al., 2016) [improving training time]
• Multiple supported attention mechanisms [dot, mlp, bilinear, multihead-dot, encoder last state,
location]
• Multiple model architectures (encoder-decoder [Wu et al., 2016], convolutional [Gehring et al, 2017],
transformer [Vaswani et al, 2017])
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inference efficiency—TensorRT
Model Name Relative TRT Speedup Hardware
Resnet 101 1.99x Titan V
Resnet 50 1.76x Titan V
Resnet 18 1.54x Jetson TX1
cifar_resnext29_16x64d 1.26x Titan V
cifar_resnet20_v2 1.21x Titan V
Resnet 18 1.8x Titan V
Alexnet 1.4x Titan V
https://cwiki.apache.org/confluence/display/MXNET/How+to+use+MXNet-TensorRT+integration
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inference efficiency—NNVM
https://aws.amazon.com/blogs/machine-learning/introducing-nnvm-compiler-a-new-open-end-to-end-compiler-for-ai-frameworks/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Portability—NNVM
https://aws.amazon.com/blogs/machine-learning/introducing-nnvm-compiler-a-new-open-end-to-end-compiler-for-ai-frameworks/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Portability—ONNX
Model Parameters
Hyper Parameters
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deployment with Amazon SageMaker
I
ML Hosting Service
Amazon ECR
30 50
10 10
ProductionVariant
Model Artifacts
Inference Image
Model versions
Versions of the same
inference code saved in
inference containers.
Prod is the primary
one, 50% of the traffic
must be served there!
One-Click!
EndpointConfiguration
Inference Endpoint
Amazon Provided Algorithms
Amazon SageMaker
InstanceType: c3.4xlarge
InitialInstanceCount: 3
ModelName: prod
VariantName: primary
InitialVariantWeight: 50
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recap
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cyrus M. Vahid
CyrusMV@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Mais conteúdo relacionado

Mais procurados

Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Amazon Web Services
 
AWS Black Belt Online Seminar 2017 AWS OpsWorks
AWS Black Belt Online Seminar 2017 AWS OpsWorksAWS Black Belt Online Seminar 2017 AWS OpsWorks
AWS Black Belt Online Seminar 2017 AWS OpsWorksAmazon Web Services Japan
 
AWS Black Belt Online Seminar 2017 AWS X-Ray
AWS Black Belt Online Seminar 2017 AWS X-RayAWS Black Belt Online Seminar 2017 AWS X-Ray
AWS Black Belt Online Seminar 2017 AWS X-RayAmazon Web Services Japan
 
Enforcing security invariants with AWS Organizations - SDD314 - AWS re:Inforc...
Enforcing security invariants with AWS Organizations - SDD314 - AWS re:Inforc...Enforcing security invariants with AWS Organizations - SDD314 - AWS re:Inforc...
Enforcing security invariants with AWS Organizations - SDD314 - AWS re:Inforc...Amazon Web Services
 
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...Amazon Web Services
 
20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre
20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre
20190319 AWS Black Belt Online Seminar Amazon FSx for LustreAmazon Web Services Japan
 
Networking Many VPCs: Transit and Shared Architectures - NET404 - re:Invent 2017
Networking Many VPCs: Transit and Shared Architectures - NET404 - re:Invent 2017Networking Many VPCs: Transit and Shared Architectures - NET404 - re:Invent 2017
Networking Many VPCs: Transit and Shared Architectures - NET404 - re:Invent 2017Amazon Web Services
 
AWS Black Belt Online Seminar 2017 AWSへのネットワーク接続とAWS上のネットワーク内部設計
AWS Black Belt Online Seminar 2017 AWSへのネットワーク接続とAWS上のネットワーク内部設計AWS Black Belt Online Seminar 2017 AWSへのネットワーク接続とAWS上のネットワーク内部設計
AWS Black Belt Online Seminar 2017 AWSへのネットワーク接続とAWS上のネットワーク内部設計Amazon Web Services Japan
 
AWS Black Belt Tech シリーズ 2015 AWS Device Farm
AWS Black Belt Tech シリーズ 2015 AWS Device FarmAWS Black Belt Tech シリーズ 2015 AWS Device Farm
AWS Black Belt Tech シリーズ 2015 AWS Device FarmAmazon Web Services Japan
 
20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier
20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier
20190220 AWS Black Belt Online Seminar Amazon S3 / GlacierAmazon Web Services Japan
 
20180322 AWS Black Belt Online Seminar AWS Snowball Edge
20180322 AWS Black Belt Online Seminar AWS Snowball Edge20180322 AWS Black Belt Online Seminar AWS Snowball Edge
20180322 AWS Black Belt Online Seminar AWS Snowball EdgeAmazon Web Services Japan
 
AWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNS
AWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNSAWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNS
AWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNSAmazon Web Services Japan
 
AWS Black Belt Online Seminar 2017 Amazon Kinesis
AWS Black Belt Online Seminar 2017 Amazon KinesisAWS Black Belt Online Seminar 2017 Amazon Kinesis
AWS Black Belt Online Seminar 2017 Amazon KinesisAmazon Web Services Japan
 
AWS Summit Seoul 2023 | 통합을 통한 보안 간소화
AWS Summit Seoul 2023 | 통합을 통한 보안 간소화AWS Summit Seoul 2023 | 통합을 통한 보안 간소화
AWS Summit Seoul 2023 | 통합을 통한 보안 간소화Amazon Web Services Korea
 
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014Amazon Web Services
 
Deep dive ECS & Fargate Deep Dive
Deep dive ECS & Fargate Deep DiveDeep dive ECS & Fargate Deep Dive
Deep dive ECS & Fargate Deep DiveAmazon Web Services
 
K8s on AWS: Introducing Amazon EKS
K8s on AWS: Introducing Amazon EKSK8s on AWS: Introducing Amazon EKS
K8s on AWS: Introducing Amazon EKSAmazon Web Services
 
AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기
AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기
AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기Amazon Web Services Korea
 
[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight
[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight
[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonightAmazon Web Services Japan
 
Build a Visual Search Engine Using Amazon SageMaker and AWS Fargate (AIM341) ...
Build a Visual Search Engine Using Amazon SageMaker and AWS Fargate (AIM341) ...Build a Visual Search Engine Using Amazon SageMaker and AWS Fargate (AIM341) ...
Build a Visual Search Engine Using Amazon SageMaker and AWS Fargate (AIM341) ...Amazon Web Services
 

Mais procurados (20)

Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
Serverless Design Patterns for Rethinking Traditional Enterprise Application ...
 
AWS Black Belt Online Seminar 2017 AWS OpsWorks
AWS Black Belt Online Seminar 2017 AWS OpsWorksAWS Black Belt Online Seminar 2017 AWS OpsWorks
AWS Black Belt Online Seminar 2017 AWS OpsWorks
 
AWS Black Belt Online Seminar 2017 AWS X-Ray
AWS Black Belt Online Seminar 2017 AWS X-RayAWS Black Belt Online Seminar 2017 AWS X-Ray
AWS Black Belt Online Seminar 2017 AWS X-Ray
 
Enforcing security invariants with AWS Organizations - SDD314 - AWS re:Inforc...
Enforcing security invariants with AWS Organizations - SDD314 - AWS re:Inforc...Enforcing security invariants with AWS Organizations - SDD314 - AWS re:Inforc...
Enforcing security invariants with AWS Organizations - SDD314 - AWS re:Inforc...
 
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
Building Your Own ML Application with AWS Lambda and Amazon SageMaker (SRV404...
 
20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre
20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre
20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre
 
Networking Many VPCs: Transit and Shared Architectures - NET404 - re:Invent 2017
Networking Many VPCs: Transit and Shared Architectures - NET404 - re:Invent 2017Networking Many VPCs: Transit and Shared Architectures - NET404 - re:Invent 2017
Networking Many VPCs: Transit and Shared Architectures - NET404 - re:Invent 2017
 
AWS Black Belt Online Seminar 2017 AWSへのネットワーク接続とAWS上のネットワーク内部設計
AWS Black Belt Online Seminar 2017 AWSへのネットワーク接続とAWS上のネットワーク内部設計AWS Black Belt Online Seminar 2017 AWSへのネットワーク接続とAWS上のネットワーク内部設計
AWS Black Belt Online Seminar 2017 AWSへのネットワーク接続とAWS上のネットワーク内部設計
 
AWS Black Belt Tech シリーズ 2015 AWS Device Farm
AWS Black Belt Tech シリーズ 2015 AWS Device FarmAWS Black Belt Tech シリーズ 2015 AWS Device Farm
AWS Black Belt Tech シリーズ 2015 AWS Device Farm
 
20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier
20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier
20190220 AWS Black Belt Online Seminar Amazon S3 / Glacier
 
20180322 AWS Black Belt Online Seminar AWS Snowball Edge
20180322 AWS Black Belt Online Seminar AWS Snowball Edge20180322 AWS Black Belt Online Seminar AWS Snowball Edge
20180322 AWS Black Belt Online Seminar AWS Snowball Edge
 
AWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNS
AWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNSAWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNS
AWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNS
 
AWS Black Belt Online Seminar 2017 Amazon Kinesis
AWS Black Belt Online Seminar 2017 Amazon KinesisAWS Black Belt Online Seminar 2017 Amazon Kinesis
AWS Black Belt Online Seminar 2017 Amazon Kinesis
 
AWS Summit Seoul 2023 | 통합을 통한 보안 간소화
AWS Summit Seoul 2023 | 통합을 통한 보안 간소화AWS Summit Seoul 2023 | 통합을 통한 보안 간소화
AWS Summit Seoul 2023 | 통합을 통한 보안 간소화
 
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
 
Deep dive ECS & Fargate Deep Dive
Deep dive ECS & Fargate Deep DiveDeep dive ECS & Fargate Deep Dive
Deep dive ECS & Fargate Deep Dive
 
K8s on AWS: Introducing Amazon EKS
K8s on AWS: Introducing Amazon EKSK8s on AWS: Introducing Amazon EKS
K8s on AWS: Introducing Amazon EKS
 
AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기
AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기
AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기
 
[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight
[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight
[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight
 
Build a Visual Search Engine Using Amazon SageMaker and AWS Fargate (AIM341) ...
Build a Visual Search Engine Using Amazon SageMaker and AWS Fargate (AIM341) ...Build a Visual Search Engine Using Amazon SageMaker and AWS Fargate (AIM341) ...
Build a Visual Search Engine Using Amazon SageMaker and AWS Fargate (AIM341) ...
 

Semelhante a Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - AWS re:Invent 2018

Building Applications with Apache MXNet
Building Applications with Apache MXNetBuilding Applications with Apache MXNet
Building Applications with Apache MXNetApache MXNet
 
AI Services for Developers | AWS Floor28
AI Services for Developers | AWS Floor28AI Services for Developers | AWS Floor28
AI Services for Developers | AWS Floor28Amazon Web Services
 
AI Services for Developers - Floor28
AI Services for Developers - Floor28AI Services for Developers - Floor28
AI Services for Developers - Floor28Boaz Ziniman
 
The Future of AI on AWS
The Future of AI on AWSThe Future of AI on AWS
The Future of AI on AWSBoaz Ziniman
 
Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringAmazon Web Services
 
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...Amazon Web Services
 
AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)Julien SIMON
 
Introduction to AI services for Developers - Builders Day Israel
Introduction to AI services for Developers - Builders Day IsraelIntroduction to AI services for Developers - Builders Day Israel
Introduction to AI services for Developers - Builders Day IsraelAmazon Web Services
 
Introduction to AI services for Developers - Builders Day Israel
Introduction to AI services for Developers - Builders Day IsraelIntroduction to AI services for Developers - Builders Day Israel
Introduction to AI services for Developers - Builders Day IsraelAmazon Web Services
 
Run Production Workloads on Spot, Save up to 90% (CMP306-R1) - AWS re:Invent ...
Run Production Workloads on Spot, Save up to 90% (CMP306-R1) - AWS re:Invent ...Run Production Workloads on Spot, Save up to 90% (CMP306-R1) - AWS re:Invent ...
Run Production Workloads on Spot, Save up to 90% (CMP306-R1) - AWS re:Invent ...Amazon Web Services
 
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalisere:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon PersonaliseAmazon Web Services
 
An Introduction to Reinforcement Learning with Amazon SageMaker
An Introduction to Reinforcement Learning with Amazon SageMakerAn Introduction to Reinforcement Learning with Amazon SageMaker
An Introduction to Reinforcement Learning with Amazon SageMakerAmazon Web Services
 
An Introduction to Reinforcement Learning (December 2018)
An Introduction to Reinforcement Learning (December 2018)An Introduction to Reinforcement Learning (December 2018)
An Introduction to Reinforcement Learning (December 2018)Julien SIMON
 
Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...
Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...
Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...Amazon Web Services
 
Accelerate ML Training on Amazon SageMaker Using GPU-Based EC2 P3 Instances (...
Accelerate ML Training on Amazon SageMaker Using GPU-Based EC2 P3 Instances (...Accelerate ML Training on Amazon SageMaker Using GPU-Based EC2 P3 Instances (...
Accelerate ML Training on Amazon SageMaker Using GPU-Based EC2 P3 Instances (...Amazon Web Services
 
Accelerate Machine Learning with Ease using Amazon SageMaker
Accelerate Machine Learning with Ease using Amazon SageMakerAccelerate Machine Learning with Ease using Amazon SageMaker
Accelerate Machine Learning with Ease using Amazon SageMakerAmazon Web Services
 
Building a Recommender System on AWS
Building a Recommender System on AWSBuilding a Recommender System on AWS
Building a Recommender System on AWSAmazon Web Services
 

Semelhante a Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - AWS re:Invent 2018 (20)

Building Applications with Apache MXNet
Building Applications with Apache MXNetBuilding Applications with Apache MXNet
Building Applications with Apache MXNet
 
Deep Learning with MXNet
Deep Learning with MXNetDeep Learning with MXNet
Deep Learning with MXNet
 
AI Services for Developers | AWS Floor28
AI Services for Developers | AWS Floor28AI Services for Developers | AWS Floor28
AI Services for Developers | AWS Floor28
 
AI Services for Developers - Floor28
AI Services for Developers - Floor28AI Services for Developers - Floor28
AI Services for Developers - Floor28
 
The Future of AI on AWS
The Future of AI on AWSThe Future of AI on AWS
The Future of AI on AWS
 
Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos Engineering
 
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
 
AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)AWS re:Invent 2018 - Machine Learning recap (December 2018)
AWS re:Invent 2018 - Machine Learning recap (December 2018)
 
Introduction to AI services for Developers - Builders Day Israel
Introduction to AI services for Developers - Builders Day IsraelIntroduction to AI services for Developers - Builders Day Israel
Introduction to AI services for Developers - Builders Day Israel
 
Introduction to AI services for Developers - Builders Day Israel
Introduction to AI services for Developers - Builders Day IsraelIntroduction to AI services for Developers - Builders Day Israel
Introduction to AI services for Developers - Builders Day Israel
 
Run Production Workloads on Spot, Save up to 90% (CMP306-R1) - AWS re:Invent ...
Run Production Workloads on Spot, Save up to 90% (CMP306-R1) - AWS re:Invent ...Run Production Workloads on Spot, Save up to 90% (CMP306-R1) - AWS re:Invent ...
Run Production Workloads on Spot, Save up to 90% (CMP306-R1) - AWS re:Invent ...
 
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalisere:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
re:Invent Deep Dive on Amazon SageMaker, Amazon Forecast and Amazon Personalise
 
Machine Learning in Practice
Machine Learning in PracticeMachine Learning in Practice
Machine Learning in Practice
 
An Introduction to Reinforcement Learning with Amazon SageMaker
An Introduction to Reinforcement Learning with Amazon SageMakerAn Introduction to Reinforcement Learning with Amazon SageMaker
An Introduction to Reinforcement Learning with Amazon SageMaker
 
An Introduction to Reinforcement Learning (December 2018)
An Introduction to Reinforcement Learning (December 2018)An Introduction to Reinforcement Learning (December 2018)
An Introduction to Reinforcement Learning (December 2018)
 
Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...
Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...
Accelerate Machine Learning with Ease Using Amazon SageMaker - BDA301 - Chica...
 
Amazon SageMaker In Action
Amazon SageMaker In Action Amazon SageMaker In Action
Amazon SageMaker In Action
 
Accelerate ML Training on Amazon SageMaker Using GPU-Based EC2 P3 Instances (...
Accelerate ML Training on Amazon SageMaker Using GPU-Based EC2 P3 Instances (...Accelerate ML Training on Amazon SageMaker Using GPU-Based EC2 P3 Instances (...
Accelerate ML Training on Amazon SageMaker Using GPU-Based EC2 P3 Instances (...
 
Accelerate Machine Learning with Ease using Amazon SageMaker
Accelerate Machine Learning with Ease using Amazon SageMakerAccelerate Machine Learning with Ease using Amazon SageMaker
Accelerate Machine Learning with Ease using Amazon SageMaker
 
Building a Recommender System on AWS
Building a Recommender System on AWSBuilding a Recommender System on AWS
Building a Recommender System on AWS
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Build Deep Learning Applications Using MXNet and Amazon SageMaker (AIM418) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build Deep Learning Applications Using MXNet and Amazon SageMaker Cyrus M. Vahid Principal Evangelist at AWL AI Labs Amazon Web Services – Deep Engine A I M 4 1 8
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Machine learning Deep learning Multi-layer perceptron Convolutional neural networks Gluon
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Intelligence Would you kindly tell me if you have the phone number of the queen?
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Intelligence [Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧ ~Φ) ∴ 𝐹 Would you kindly tell me if you have the phone number of the queen?
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Intelligence [Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧ ~Φ) ∴ 𝐹 Would you kindly tell me if you have the phone number of the queen? The Spanish King officially abdicated in ... of his …, Felipe. Felipe will be confirmed tomorrow as the new Spanish ... .
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Intelligence [Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧ ~Φ) ∴ 𝐹 Would you kindly tell me if you have the phone number of the queen? The Spanish King officially abdicated in favour of his son, Felipe. Felipe will be confirmed tomorrow as the new Spanish King.
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Intelligence [Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧ ~Φ) ∴ 𝐹 Would you kindly tell me if you have the phone number of the queen? In our quest to implement perfect NLP tools, we have developed state of the art RNNs. Now we can use them to …. (Jeoffy Hinton – Coursera) The Spanish King officially abdicated in favour of his son, Felipe. Felipe will be confirmed tomorrow as the new Spanish King.
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Intelligence [Ω → Ε ∧ Η ∨ (Γ ∨ ~Γ)] ∧ (Φ ∧ ~Φ) ∴ 𝐹 Would you kindly tell me if you have the phone number of the queen? The Spanish King officially abdicated in favour of his son, Felipe. Felipe will be confirmed tomorrow as the new Spanish King. In our quest to implement perfect NLP tools, we have developed state of the art RNNs. Now we can use them to wreck a nice beach. (Jeoffy Hinton – Coursera)
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Biological learning Source: http://cs231n.github.io/neural-networks-1/
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Perceptron I1 I2 B O w1 w2 w3 𝑓 𝑥𝑖, 𝑤𝑖 = Φ(𝑏 + Σ𝑖(𝑤𝑖. 𝑥𝑖)) Φ 𝑥 = ቊ 1, 𝑖𝑓 𝑥 ≥ 0.5 0, 𝑖𝑓 𝑥 < 0.5
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Perceptron I1 I2 B O 1 1 -1.5 𝑂1 = 1𝑥1 + 1𝑥1 + −1.5 = 0.5 ∴ Φ(𝑂1) = 1 𝐼1 = 𝐼2 = 𝐵1 = 1 𝑂1 = 1𝑥1 + 0𝑥1 + −1.5 = −0.5 ∴ Φ(𝑂1) = 0 𝐼2 = 0 ; 𝐼1 = 𝐵1 = 1 P Q P ∧ Q T T T T F F F T F F F F P Q x0 0 0
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Non-linear space P Q x0 0 0 P Q x0 x 0 P Q P ∧ Q P ⨁ Q T T T T T F F F F T F F F F F T
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deep learning Hidden layers Input layer Output
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The “learning” in deep learning 0.4 0.3 0.2 0.9 ... backpropagation (gradient descent) ො𝑦 != ො𝑦 0.4 ± 𝛿 0.3 ± 𝛿 new weights new weights 0 1 0 1 1 . . . X input label ... ො𝑦
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Universal function approximation • Let 𝜙 . 𝑏𝑒 𝑎 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡, 𝑏𝑜𝑢𝑛𝑑𝑒𝑑, 𝑎𝑛𝑑 𝑚𝑜𝑛𝑜𝑡𝑖𝑐𝑎𝑙𝑙𝑦 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 • 𝐿𝑒𝑡 𝐼 𝑚 𝑑𝑒𝑛𝑜𝑡𝑒 𝑡ℎ𝑒 𝑚 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑎𝑙 𝑢𝑛𝑖𝑡 ℎ𝑦𝑝𝑒𝑟𝑐𝑢𝑏𝑒 0,1 𝑚. 𝑇ℎ𝑒 𝑠𝑝𝑎𝑐𝑒 𝑜𝑓 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 𝑜𝑛 𝐼 𝑚 𝑖𝑠 𝑑𝑒𝑛𝑜𝑡𝑒𝑑 𝑏𝑦 𝐶 𝐼 𝑚 . • 𝑇ℎ𝑒𝑛, 𝑔𝑖𝑣𝑒𝑛 𝜖 > 0 𝑎𝑛𝑑 𝑎𝑛𝑦 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑓𝜖𝐶 𝐼 𝑚 , 𝑡ℎ𝑒𝑟𝑒 𝑒𝑥𝑖𝑠𝑡𝑠 𝑎𝑛 𝑖𝑛𝑡𝑒𝑔𝑒𝑟 𝑁, 𝑟𝑒𝑎𝑙 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝑣𝑖, 𝑏𝑖 𝜖ℝ 𝑎𝑛𝑑 𝑟𝑒𝑎𝑙 𝑣𝑒𝑐𝑡𝑜𝑟𝑠 𝑤𝑖 𝜖ℝ 𝑚 , 𝑤ℎ𝑒𝑟𝑒 𝑖 = 1, 2, … , 𝑁, 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑤𝑒 𝑚𝑎𝑦 𝑑𝑒𝑓𝑖𝑛𝑒 𝐹 𝑥 = ෍ 𝑖=1 𝑁 𝑣𝑖 𝜙(𝑤𝑖 𝑇 𝑥 + 𝑏𝑖 ) 𝑎𝑠 𝑎𝑛 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 𝑟𝑒𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑜𝑓𝑐 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑓 𝑤ℎ𝑒𝑟𝑒 𝑖𝑠 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓 𝜙; 𝑡ℎ𝑎𝑡 𝑖𝑠 𝐹 𝑥 − 𝑓 𝑥 < 𝜖 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 𝑖𝑛 𝐼 𝑚
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Activation functions
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gradient descent • After training over data we sill have an error surface
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gradient descent • After training over data we sill have an error surface • The goal of optimization is to reach the minima of the surface, and thus reducing error
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gradient descent • Loss function, 𝐽, is a measure of how well an algorithm models a dataset • There are several loss functions and one can combine them. Some of the more popular loss functions are RMST, hinge, L1, L2, … • For more information please check: https://tinyurl.com/y7c6ub5k
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gradient descent • Loss function, 𝐽, is a measure of how well an algorithm models a dataset • Weights are adjusted in the opposite direction of calculated gradients Learning rate Gradient 𝛼 𝜕𝐽 𝜃 𝜕𝜃𝑗
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Non-convex error surface • 𝑓: ℝ 𝑛 → ℝ 𝑖𝑠 𝑐𝑜𝑛𝑣𝑒𝑥 𝑖𝑓 𝑎𝑛𝑑 𝑖𝑓 ∀ 𝑥1, 𝑥2 𝜖ℝ 𝑛 , 𝑎𝑛𝑑 ∀𝜆𝜖 0,1 : • 𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓(𝑥2) • With a convex objective and a convex feasible region, there can be only one optimal solution (globally optimal) • Non-convex optimization problem may have multiple feasible regions and multiple locally optimal points within each region • It can take time exponential to determine there is no solution, an optimal solution exists, or objective function is unbounded Global Optimum Global Optimum Local Optimum
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Non-convex error surface • In deep learning we almost exclusively need to solve a complex non-convex optimization problem in an n- dimensional vector space
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recap • A neural network with at least one hidden layer can approximate any function • Training a network (backpropagation) consists of: • Initializing weights at “random” • Compute the network forward (forward pass) • Reduce loss by updating weights in opposite direction of gradient of the loss function • Repeat the process until an optimized set of weights are calculated • The optimization is complicated and computationally very intensive due to non-convexity of the optimization space
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Minibatch training • Updating millions of weights at each pass is inefficient (online) • Updating weights at the end of each run over all data is not effective (batch) • We use minibatch training to capture the best of the two worlds • Epoch is one forward and backward pass on all data • Batch size is the number of training examples in one forward/backward pass • https://tinyurl.com/yc2l63lq • https://tinyurl.com/yaof5axr
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Learning rate adjustment • If learning rate is too large will not converge due to oscillation • If learning rate is too small convergence will take a very long time • In SToA is common to use a learning rate scheduler. For more information please refer to: • https://tinyurl.com/y9mcfvjf • https://tinyurl.com/ybxyncgs https://tinyurl.com/qfp2kfq
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Saddle points • When partial derivatives in respect to all variables are zero we have a saddle point • 𝑓 𝑥, 𝑦 = 𝑥2 − 𝑦2 ; 𝜕𝑓 𝜕𝑥 = 2x , 𝜕𝑓 𝜕𝑦 = 2 • 𝑎𝑡 0,0 𝜕𝑓 𝜕𝑥 = 𝜕𝑓 𝜕𝑦 = 0 • 𝑓 𝑥, 0 = 𝑥2 ℎ𝑎𝑠 𝑎 𝑙𝑜𝑐𝑎𝑙 𝑚𝑖𝑛𝑖𝑚𝑎 𝑎𝑡 𝑥 = 0 • 𝑓 0, 𝑦 = −𝑦2 ℎ𝑎𝑠 𝑎 𝑙𝑜𝑐𝑎𝑙 𝑚𝑖𝑛𝑖𝑚𝑎 𝑎𝑡 𝑦 = 0 • This results in a stable minima at (0,0) • (0,0) as demonstrated in the picture is not a global optimum
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Flavors of SGD http://ruder.io/optimizing-gradient-descent/index.html#gradientdescentvariants
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Overfitting • Overfitting happens when model learns signal as well as noise in the training data • This prevents the model to generalize well on unseen data
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Overfitting • Overfitting happens when model learns signal as well as noise in the training data • This prevents the model to generalize well on unseen data • Overfitting can result from having too few data points, noisy data, or too large of a network for the existing data
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dropout and drop connect Regular Dropout Drop connect
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dropout and drop connect
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Computational dependency/graph • 𝑧 = 𝑥 ⋅ 𝑦 • 𝑘 = 𝑎 ⋅ 𝑏 • 𝑡 = 𝜆𝑧 + 𝑘 x y 𝑧 x 𝜆 𝑢 x a x b k 𝑡 + 1 1 2 3
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Computational dependency/graph net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=10) net = mx.sym.SoftmaxOutput(net, name='softmax')
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Computational dependency/graph net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=10) net = mx.sym.SoftmaxOutput(net, name='softmax')
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Training import logging logging.getLogger().setLevel(logging.DEBUG) # logging to stdout # create a trainable module on compute context mlp_model = mx.mod.Module(symbol=mlp, context=ctx) mlp_model.fit(train_iter, eval_data=val_iter, optimizer='sgd', optimizer_params={'learning_rate':0.1}, eval_metric='acc', batch_end_callback = mx.callback.Speedometer(batch_size, 100), num_epoch=10)
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Computational dependency/graph net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=10) net = mx.sym.SoftmaxOutput(net, name='softmax')
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Training efficiency—92% https://mxnet.incubator.apache.org/tutorials/vision/large_scale_classification.html
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. End-to-end machine learning platform Zero setup Flexible model training Pay by the second $ Amazon SageMaker Build, train, and deploy machine learning models at scale
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker and distributed training • Faster training through Amazon SageMaker streaming for custom algorithms • Boilerplate code for your algorithms to train over a cluster PCA Bemchmark if len(hosts) == 1: kvstore = 'device' if num_gpus > 0 else 'local’ else: kvstore = 'dist_device_sync' if num_gpus > 0 else 'dist_sync’ trainer = gluon.Trainer(net.collect_params(), 'sgd’, {'learning_rate': learning_rate, 'momentum': momentum}, kvstore=kvstore)
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Training code • Matrix factorization • Regression • Principal component analysis • K-means clustering • Gradient boosted trees • And more! Amazon provided algorithms Bring your own script (IM builds the container) Bring your own algorithm (you build the container) Fetch Training data Save Model Artifacts Fully managed – Secured – Amazon ECR Save Inference Image IM estimators in Apache Spark CPU GPU HPO Distributed training
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Automatic model tuning Training code • Factorization machine • Regression/classification • Principal component analysis • K-means clustering • XGBoost • DeepAR • And more Amazon SageMaker built-in algorithms Bring your own script (prebuilt containers) Bring your own algorithm Fetch Training data Save Model Artifacts Fully managed – Secured– Automatic model tuning
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evolution of deep learning frameworks
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why Gluon? Simple, easy-to- understand code Flexible, imperative structure Dynamic graphs High performance
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. net = gluon.nn.HybridSequential() with net.name_scope(): net.add(gluon.nn.Dense(units=64, activation='relu')) net.add(gluon.nn.Dense(units=10)) Define the network
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx, force_reinit=True) Initialize the model
  • 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss() Loss function
  • 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Choose an optimizer trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.02})
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. mnist = mx.test_utils.get_mnist() batch_size = 64 num_inputs = 784 num_outputs = 10 def transform(data, label): return data.astype(np.float32)/255, label.astype(np.float32) train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform), batch_size, shuffle=True) test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform), batch_size, shuffle=False) Load the data
  • 53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. for e in range(10): cumulative_loss = 0 for i, (data, label) in enumerate(train_data): data = data.as_in_context(model_ctx).reshape((-1, 784)) label = label.as_in_context(model_ctx) with autograd.record(): output = net(data) loss = softmax_cross_entropy(output, label) loss.backward() trainer.step(data.shape[0]) Training
  • 54. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 56. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Spatial relatedness
  • 57. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Spatial relatedness
  • 58. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Convolution* • Convolution is a specialized kind of linear operation • We use a reduction mechanism that is weighted differently based on relevance • Example: Measurement of location of a spaceship along its trajectory creates a discrete set of measurements. Each one could be fuzzy, but averaging them helps remove the noise, and have better prediction on the current location with more weight given to the local position. • 𝑥 is often called input (often multi-dimensional array of data) and w is called kernel (often multi-dimensional array of parameters)
  • 59. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Convolution*
  • 60. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Convolution*
  • 61. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pooling http://www.deeplearningbook.org/contents/convnets. html • A pooling function replaces the output of the net at a certain location with a summary statistic of the nearby outputs • Pooling helps detect existence of features as opposed to detecting where a feature is through making a representation invariance to small translation in the input
  • 62. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pooling and strides
  • 63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Feature extraction • Feature extraction layers extract features through: • The first layer performs several convolutions in parallel to produce a set of linear activations • In the second stage (detector), each linear activation is run through a nonlinear activation function, such as ReLU • The performs pooling on the output • In the end fully connected layers perform discrimination tasks on the enriched data
  • 64. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Feature extraction
  • 65. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Convolution neural networks (CNNs)
  • 66. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Convolutions
  • 67. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pooling output
  • 68. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Full convolution neural network structure
  • 69. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. num_fc = 512 net = gluon.nn.Sequential() with net.name_scope(): net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu')) net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2)) net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu')) net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2)) # The Flatten layer collapses all axis, except the first one, into one axis. net.add(gluon.nn.Flatten()) net.add(gluon.nn.Dense(num_fc, activation="relu")) net.add(gluon.nn.Dense(num_outputs)) Gluon code
  • 70. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 71. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 72. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What’s new—GluonCV • A deep learning toolkit for computer vision • Features • Training scripts that reproduce SOTA results reported in latest papers • A large set of pre-trained models • Carefully designed APIs and easy to understand implementations • Community support
  • 73. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What’s new—GluonNLP • A deep learning toolkit for natural language processing • Features • Training scripts to reproduce SOTA results reported in research papers • Pre-trained models for common NLP tasks • Carefully designed APIs that greatly reduce the implementation complexity • Community support
  • 74. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What’s new—Keras backend Instance Type GPUs Batch Size Keras-MXNet (img/sec) Keras- TensorFlow (img/sec) C5.18X Large 0 32df 13 4 P3.8X Large 1 32 194 184 P3.8X Large 4 128 764 393 P3.16X Large 8 256 1068 261 Instance Type GPUs Batch Size Keras-MXNet (img/sec) Keras- TensorFlow (img/sec) C5.X Large 0 32 5.79 3.27 C5.8X Large 0 32 27.9 18.2 https://github.com/awslabs/keras-apache-mxnet/tree/master/benchmark
  • 75. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What’s new—Sockeye • A seq2seq toolkit based on MXNet • Features • Beam search inference • Easy ensembling of multiple models • Residual connections between RNN layers (Wu et al., 2016) [deep LSTM with parallelism] • Lexical biasing of output layer predictions (Arthur et al., 2016) [low frequency words] • Modeling coverage (Tu et al., 2016) [keeping attention history to reduce over and under translation] • Context gating (Tu et al., 2017) [Improving adequacy of translation by controlling rations of source and target context] • Cross-entropy label smoothing (e.g., Pereyra et al., 2017) • Layer normalization (Ba et al., 2016) [improving training time] • Multiple supported attention mechanisms [dot, mlp, bilinear, multihead-dot, encoder last state, location] • Multiple model architectures (encoder-decoder [Wu et al., 2016], convolutional [Gehring et al, 2017], transformer [Vaswani et al, 2017])
  • 76. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 77. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Inference efficiency—TensorRT Model Name Relative TRT Speedup Hardware Resnet 101 1.99x Titan V Resnet 50 1.76x Titan V Resnet 18 1.54x Jetson TX1 cifar_resnext29_16x64d 1.26x Titan V cifar_resnet20_v2 1.21x Titan V Resnet 18 1.8x Titan V Alexnet 1.4x Titan V https://cwiki.apache.org/confluence/display/MXNET/How+to+use+MXNet-TensorRT+integration
  • 78. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Inference efficiency—NNVM https://aws.amazon.com/blogs/machine-learning/introducing-nnvm-compiler-a-new-open-end-to-end-compiler-for-ai-frameworks/
  • 79. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 80. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Portability—NNVM https://aws.amazon.com/blogs/machine-learning/introducing-nnvm-compiler-a-new-open-end-to-end-compiler-for-ai-frameworks/
  • 81. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Portability—ONNX Model Parameters Hyper Parameters
  • 82. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deployment with Amazon SageMaker I ML Hosting Service Amazon ECR 30 50 10 10 ProductionVariant Model Artifacts Inference Image Model versions Versions of the same inference code saved in inference containers. Prod is the primary one, 50% of the traffic must be served there! One-Click! EndpointConfiguration Inference Endpoint Amazon Provided Algorithms Amazon SageMaker InstanceType: c3.4xlarge InitialInstanceCount: 3 ModelName: prod VariantName: primary InitialVariantWeight: 50
  • 83. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recap
  • 84. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cyrus M. Vahid CyrusMV@amazon.com
  • 85. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.