This document provides an overview of a workshop on deploying a deep learning framework on Amazon ECS and Spot Instances. The workshop will introduce MXNet, containers, Amazon ECS, Amazon ECR, AWS CloudFormation, Amazon EC2 Spot Fleet and Spot Instances. It will include hands-on labs to build an MXNet Docker image, deploy an MXNet container with ECS, and run an image classification demo using a Spot Fleet on ECS. The overall goal is to learn how to cost-effectively run deep learning workloads on AWS.
2. What to Expect From This Workshop
• Introduce MXNet
• Containers
• Overview of Amazon ECS + Amazon ECR
• Overview of AWS CloudFormation
• Overview of Amazon EC2 Spot Fleet / Spot Instances
• Hands-on workshop
3. • Open source, deep learning framework -
https://github.com/dmlc/mxnet
• Define, train, and deploy deep neural
networks
• Highly scalable – single/multiple hosts,
CPU/GPU support
• Support for multiple languages
What Is MXNet?
OutputInput
4. Why Containers?
• Increase infrastructure utilization
• Environment isolation and fidelity
• Run diverse applications on shared hardware
• Changes are tracked
• Easy to deploy
6. Amazon ECS Benefits
Cluster management
made easy
Flexible scheduling Integrated and
extensible
Security Performance at scale
7. Amazon ECS Architecture
Docker
Task
Container instance
Amazon
ECS
Container
ECS agent
ELB
Internet
ELB
User /
Scheduler
API
Cluster Management Engine
Task
Container
Docker
Task
Container instance
Container
ECS agent
Task
Container
Docker
Task
Container instance
Container
ECS agent
Task
Container
AZ 1 AZ 2
Key/Value Store
Agent Communication Service
8. What Is Amazon ECR?
Amazon EC2 Container Registry (Amazon ECR) is a fully-
managed Docker container registry that makes it easy for
developers to store, manage, and deploy Docker container
images.
Amazon ECR is integrated with Amazon EC2 Container
Service (Amazon ECS), simplifying your development to
production workflow.
Learn more: https://aws.amazon.com/ecr/
9. How Does Amazon ECS Use Amazon ECR?
Amazon
ECR
Subnet
Amazon
ECS
Spot Instance
14. Why Do Customers Use CloudFormation?
Developers/DevOps teams value CloudFormation for its ability to treat
infrastructure as code, allowing them to apply software engineering principles,
such as SOA, revision control, code reviews, integration testing to
infrastructure.
IT Admins and MSPs value CloudFormation as a platform to enable
standardization, managed consumption, and role-specialization.
ISVs value CloudFormation for its ability to support scaling out of multi-tenant
SaaS products by quickly replicating or updating stacks. ISVs also value
CloudFormation as a way to package and deploy their software in their
customer accounts on AWS.
16. On-Demand
Pay for compute
capacity by the hour
with no long-term
commitments
For spiky workloads,
or to define needs
Amazon EC2 Consumption Models
Reserved
Make a low, one-time
payment and receive
a significant discount
on the hourly charge
For committed
utilization
Spot
Bid for unused
capacity, charged at a
Spot Price that
fluctuates based on
supply and demand
For time-insensitive
or transient
workloads
17. With Spot, the Rules Are Simple
Markets where the price of
compute changes based on
supply and demand.
You’ll never pay more than your
bid. When the market exceeds
your bid, you get 2 minutes to
wrap up your work.
18. $0.27 $0.29$0.50
1b 1c1a
8XL
$0.30 $0.16$0.214XL
$0.07 $0.08$0.082XL
$0.05 $0.04$0.04XL
$0.01 $0.04$0.01L
C4
$1.76
On
Demand
$0.88
$0.44
$.22
$0.11
Show Me the Markets!
Each instance family
Each instance size
Each Availability Zone
In every region
Is a separate Spot Market
20. Spot Fleet Helps You…
Launch thousands of Spot Instances
with one single API call
Get the best price
Find the lowest-priced horsepower that works for you
Or
Get diversified resources
Diversify your fleet and grow your availability
¢
Apply custom weighting
Create your own capacity unit based on your application
needs
21. Diversification with Amazon EC2 Spot Fleet
Multiple Spot Instances selected
Multiple Availability Zones
selected
Pick the instances with similar
performance characteristics, e.g.,
c3.large, m3.large, m4.large,
r3.large, c4.large
¢
23. Overall Architecture
Amazon
S3
AWS CLI
Public subnet – AZ #1 Public subnet – AZ #2
Amazon
ECS
Spot Instance Spot Instance
Spot Fleet
>_ SSH
Amazon
ECR
AWS Management
Console
RunTask
24. Amazon
S3
Public subnet – AZ #1 Public subnet – AZ #2
Amazon
ECS
Spot Instance Spot Instance
Spot Fleet
Amazon
ECR
AWS
CloudFormation
Lab 1: Set up the Workshop Environment
31. Related Sessions
• Deep Dive into Apache MXNet on AWS (BDA401)
• An Introduction to Amazon Rekognition, for Deep
Learning-based Computer Vision (BDA301)
• Getting the Most Bang for Your Buck with #EC2
#Winning (SRV301)
• Save 90% on Your Containerized Workload (DEM07)