SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine Learning with Kubernetes
Yaniv Donenfeld
AI/ML Solutions
Container Services, AWS
Mike Crute
Sr. Engineer
Container Services, AWS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Centerpiece for digital transformation
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EC2 P3
& P3dn
EC2
C5
FPGAs Greengrass
Elastic
inference
FRAMEWORKS INTERFACES INFRASTRUCTURE
Inferentia
EC2
G4
The Amazon ML stack:
Broadest & deepest set of capabilities
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ground Truth Notebooks
Algorithms +
Marketplace
Reinforcement
Learning
Training Optimization Deployment Hosting
ML Frameworks +
Infrastructure EC2 P3
& P3dn
EC2
C5
FPGAs Greengrass
Elastic
inference
FRAMEWORKS INTERFACES INFRASTRUCTURE
Inferentia
EC2
G4
The Amazon ML stack:
Broadest & deepest set of capabilities
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N
V I D E O
F O R E C A S TT E X T R A C T P E R S O N A L I Z E
VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS
ML Services Amazon
SageMaker Ground Truth Notebooks Algorithms + Marketplace
Reinforcement
Learning
Training Optimization Deployment Hosting
ML Frameworks +
Infrastructure EC2 P3
& P3dn
EC2
C5
FPGAs Greengrass
Elastic
inference
FRAMEWORKS INTERFACES INFRASTRUCTURE
Inferentia
EC2
G4
The Amazon ML stack:
Broadest & deepest set of capabilities
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Amazon ML stack:
Broadest & deepest set of capabilities
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N
V I D E O
F O R E C A S TT E X T R A C T P E R S O N A L I Z E
VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS
Amazon
SageMaker Ground Truth Notebooks Algorithms + Marketplace
Reinforcement
Learning
Training Optimization Deployment Hosting
EC2 P3
& P3dn
EC2
C5
FPGAs Greengrass
Elastic
inference
FRAMEWORKS INTERFACES
Inferentia
EC2
G4
INFRASTRUCTURE
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“A little less conversation, a little more action, please”
M A C H I N E L E A R N I N G
S T O R A G E
Amazon Redshift
+ Redshift Spectrum
Amazon
QuickSight
Amazon EMR
Hadoop, Spark, Presto,
Pig, Hive…19 total
Amazon
Athena
Amazon
Kinesis
Amazon
Elasticsearch
Service
AWS Glue
A N A L Y T I C S
Amazon S3
Standard-IA
Amazon S3
Standard
Amazon S3
One Zone-IA
Amazon
Glacier
Amazon S3
Intelligent-
Tiering
N E W
Amazon
EBS
Amazon S3
Glacier Deep
Archive
N E W
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N
V I D E O
F O R E C A S TT E X T R A C T P E R S O N A L I Z E
V I S I O N S P E E C H L A N G U A G E C H A T B O T S F O R E C A S T I N G R E C O M M E N D A T I O N S
Amazon
SageMaker
Ground Truth Notebooks
Algorithms +
Marketplace
Reinforcement
Learning
Training Optimization Deployment Hosting
E C 2 P3
& P3 d n
E C 2
C 5 FPG A s G r e e n g r a s s
E la s t i c
i n f e r e n c e
F R A M E W O R K S I N T E R F A C E S
I n f e r e n t i a
E C 2
G 4
I N F R A S T R U C T U R E
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
R E K O G N I T I O N
I M A G E
P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N
V I D E O
F O R E C A S TT E X T R A C T P E R S O N A L I Z E
VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS
Amazon
SageMaker Ground Truth Notebooks Algorithms + Marketplace
Reinforcement
Learning
Training Optimization Deployment Hosting
EC2 P3
& P3dn
EC2
C5
FPGAs Greengrass
Elastic
inference
FRAMEWORKS INTERFACES INFRASTRUCTURE
Inferentia
EC2
G4
Machine Learning using Kubernetes
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ML Frameworks +
Infrastructure EC2 P3
& P3dn
EC2
C5
FPGAs Greengrass
Elastic
inference
FRAMEWORKS INTERFACES INFRASTRUCTURE
Inferentia
EC2
G4
Machine Learning using Kubernetes
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why Machine Learning on Kubernetes?
Composability Portability Scalability
O N - P R E M I S E S C L O U D
http://www.shutterstock.com/gallery-635827p1.html
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Machine Learning Development Workflow
ML Workflow
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why Jupyter?
Jupyter provides an easy
on-ramp to build, train,
and deploy ML models
Jupyter Notebook allows you to
create and share documents that
contain live code, equations,
visualizations, and narrative text
Uses include:
• data cleaning and transformation
• numerical simulation
• statistical modeling
• data visualization
• machine learning
• and much more
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Machine Learning Development Workflow
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ML on K8s—without KubeFlow
Credits: @aronchik
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ML on K8s—with KubeFlow
Credits: @aronchik
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in
KubeFlow?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kubeflow Pipelines
• Invoke EMR/Spark jobs for
preprocessing directly as a
pipeline step
• Invoke SageMaker jobs to
utilize the SageMaker
framework from within
Kubeflow
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Advantages of KubeFlow on AWS
EKS cluster provision with
External traffic with
to manage Lustre file system
Centralized and unified K8s logs in
TLS and Auth with and
for your K8s API server endpoint
Detect GPU instance and install
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
TensorFlow
Open source library to develop and train ML models
Created by Google Brain team
Can run on desktop, servers, mobiles, edge devices
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS is the platform of choice to run TensorFlow
of all
TensorFlow
workloads in the
cloud runs on AWS
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Train twice as fast
with TensorFlow
65%
Scaling efficiency
with 256 GPUs
S T O C K T E N S O R F L O W
100%
Scaling efficiency
with 256 GPUS
A W S - O P T I M Z E D T E N S O R F L O W
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Autonomous Vehicle Sensor Technology
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
LIDAR
Camera
GPS/HD Maps
Telematics Control Unit
Accelerometer/Gyroscope
DSRC
Radar
Infrared
Ultrasonic
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SAE
LEVEL
NAME CONTROL
MONITORING
VIGILANCE
FALLBACK
HANDOFF
OPERATING
CONDITIONS
0 Unassisted Human
1 Driver Assist
Human and
AD System
Human Human Some
2 Partial Automation AD System Human
Human
(0 warning)
Some
3 Conditional Automation AD System AD System
Human
(adequate warning,
assumed 15 seconds)
Some
4 High Automation AD System AD System AD System Some
5 Full Automation AD System AD System AD System All
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SAE
LEVEL
NAME CONTROL
MONITORING
VIGILANCE
FALLBACK
HANDOFF
OPERATING
CONDITIONS
0 Unassisted Human
1 Driver Assist
Human and
AD System
Human Human Some
2 Partial Automation AD System Human
Human
(0 warning)
Some
3 Conditional Automation AD System AD System
Human
(adequate warning,
assumed 15 seconds)
Some
4 High Automation AD System AD System AD System Some
5 Full Automation AD System AD System AD System All
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AV Workload Characteristics
• Distributed, Multi-GPU, Multi-Node Training jobs
• ML Practitioners don’t know/want to deal with infrastructure
• Large datasets on S3 require copying to a staging area on the cluster
• Data should be shared across nodes and re-used across jobs
• Distributed training code can be hard
• Capacity provisioning and smart autoscaling across jobs can be
complex
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AV Workload Characteristics
• Distributed, Multi-GPU, Multi-Node Training jobs
• ML Practitioners don’t know/want to deal with infrastructure
Kubeflow on AWS provides an abstraction to setup your
cluster as part of the Kubeflow setup process
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AV Workload Characteristics
• Distributed, Multi-GPU, Multi-Node Training jobs
• Large datasets on S3 require copying to a staging area on the cluster
• Data should be shared across nodes and re-used across jobs
• New CSI driver for EFS and FSx Lustre – dynamically provision a
file system using PVC.
• FSx Lustre can shadow an S3 Bucket – no need to copy
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AV Workload Characteristics
• Distributed, Multi-GPU, Multi-Node Training jobs
• Distributed training code can be hard
• Use Horovod, a Distributed Training framework for TensorFlow,
Keras, PyTorch, and MXNet
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AV Workload Characteristics
• Distributed, Multi-GPU, Multi-Node Training jobs
• Capacity provisioning and smart autoscaling across jobs can be
complex
• Introducing updates to Escalator - a cluster auto scaling
component that is tuned for batch/ML workloads, and EC2 GPU
instance autoscaling
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DEMO
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AV Workload Characteristics
• Large-scale simulation workloads
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
-
200,000
400,000
600,000
800,000
1,000,000
1,200,000
Koala Genomics AV Safety
Concurrent CPUs
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
-
50,000,000
100,000,000
150,000,000
200,000,000
250,000,000
300,000,000
350,000,000
Koala Genomics AV Safety
Total Core Hours / Year
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Rank / Name Rmax / Rpeak (Petaflops)
1. Summit 143.500 / 200.795
2. Sierra 94.640 / 125.436
3. Sunway Tahihu Light 93.015 / 125.436
4. Tianhe-2A 61.445 / 100.679
5. Piz Daint 21.230 / 27.154
6. Trinity 20.159 / 41.461
7. AI Bridging Cloud Infrastructure 19.880 / 32.577
8. SuperMUCNG 19.477 / 26.874
9. Titan 17.590 / 27.113
10. Sequoia 17.173 / 20.133
TOP500 – Top 10 Supercomputers in Nov 2018
We’re helping our AV customers
run at Supercomputer Scale, targeting the
equivalent of one of the Top 10 largest
supercomputers in the world.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges in setting up containers for ML
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS deep learning containers
KEY FEATURES
Customizable
container images
Support for TensorFlow,
Apache MXNet
Single and multi-node
training and inference
Pre-packaged Docker
container images
fully configured
and validated
Best performance
and scalability
without tuning
Works with Amazon EKS,
Amazon ECS,
and Amazon EC2
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
16 container images
Training Inference
GPU CPU
Python 2.7 Python 3.6
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Getting Started
https://github.com/aws-samples/machine-learning-using-k8s/

Mais conteúdo relacionado

Mais procurados

Breaking the Monolith Road to Containers
Breaking the Monolith Road to ContainersBreaking the Monolith Road to Containers
Breaking the Monolith Road to ContainersAmazon Web Services
 
The evolution of automated reasoning technology at AWS - SEP201 - AWS re:Info...
The evolution of automated reasoning technology at AWS - SEP201 - AWS re:Info...The evolution of automated reasoning technology at AWS - SEP201 - AWS re:Info...
The evolution of automated reasoning technology at AWS - SEP201 - AWS re:Info...Amazon Web Services
 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerArun Gupta
 
Kubernetes on AWS with Amazon EKS - MAD301 - New York AWS Summit
Kubernetes on AWS with Amazon EKS - MAD301 - New York AWS SummitKubernetes on AWS with Amazon EKS - MAD301 - New York AWS Summit
Kubernetes on AWS with Amazon EKS - MAD301 - New York AWS SummitAmazon Web Services
 
Design with ops in mind | AWS Summit Tel Aviv 2019
Design with ops in mind | AWS Summit Tel Aviv 2019Design with ops in mind | AWS Summit Tel Aviv 2019
Design with ops in mind | AWS Summit Tel Aviv 2019Amazon Web Services
 
The IoT Offering Explained in Plain English - IOT201 - re:Invent 2017
The IoT Offering Explained in Plain English - IOT201 - re:Invent 2017The IoT Offering Explained in Plain English - IOT201 - re:Invent 2017
The IoT Offering Explained in Plain English - IOT201 - re:Invent 2017Amazon Web Services
 
Twelve-factor serverless applications - MAD302 - Santa Clara AWS Summit
Twelve-factor serverless applications - MAD302 - Santa Clara AWS SummitTwelve-factor serverless applications - MAD302 - Santa Clara AWS Summit
Twelve-factor serverless applications - MAD302 - Santa Clara AWS SummitAmazon Web Services
 
Blur the boundaries between your on-premises to AWS cloud by embracing VMWare...
Blur the boundaries between your on-premises to AWS cloud by embracing VMWare...Blur the boundaries between your on-premises to AWS cloud by embracing VMWare...
Blur the boundaries between your on-premises to AWS cloud by embracing VMWare...AWS Summits
 
ENT227_IoT + Cloud enables Enterprise Digital Transformation
ENT227_IoT + Cloud enables Enterprise Digital TransformationENT227_IoT + Cloud enables Enterprise Digital Transformation
ENT227_IoT + Cloud enables Enterprise Digital TransformationAmazon Web Services
 
Introducing Amazon SageMaker - AWS Online Tech Talks
Introducing Amazon SageMaker - AWS Online Tech TalksIntroducing Amazon SageMaker - AWS Online Tech Talks
Introducing Amazon SageMaker - AWS Online Tech TalksAmazon Web Services
 
NEW LAUNCH! Amazon FreeRTOS: IoT Operating System for Microcontrollers - IOT2...
NEW LAUNCH! Amazon FreeRTOS: IoT Operating System for Microcontrollers - IOT2...NEW LAUNCH! Amazon FreeRTOS: IoT Operating System for Microcontrollers - IOT2...
NEW LAUNCH! Amazon FreeRTOS: IoT Operating System for Microcontrollers - IOT2...Amazon Web Services
 
AWS IoT를 이용한 퍼스널 푸드 컴퓨터 개발사례::한광희::AWS Summit Seoul 2018
AWS IoT를 이용한 퍼스널 푸드 컴퓨터 개발사례::한광희::AWS Summit Seoul 2018AWS IoT를 이용한 퍼스널 푸드 컴퓨터 개발사례::한광희::AWS Summit Seoul 2018
AWS IoT를 이용한 퍼스널 푸드 컴퓨터 개발사례::한광희::AWS Summit Seoul 2018Amazon Web Services Korea
 
From Code to a running container | AWS Summit Tel Aviv 2019
From Code to a running container | AWS Summit Tel Aviv 2019From Code to a running container | AWS Summit Tel Aviv 2019
From Code to a running container | AWS Summit Tel Aviv 2019AWS Summits
 
Deriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML ArchitecturesDeriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML ArchitecturesAmazon Web Services
 
AWS 無伺服器開發工作坊_Matt.pdf
AWS 無伺服器開發工作坊_Matt.pdfAWS 無伺服器開發工作坊_Matt.pdf
AWS 無伺服器開發工作坊_Matt.pdfAmazon Web Services
 
Architecting Security & Governance Across Your AWS Landing Zone
Architecting Security & Governance Across Your AWS Landing ZoneArchitecting Security & Governance Across Your AWS Landing Zone
Architecting Security & Governance Across Your AWS Landing ZoneAmazon Web Services
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...AWS Summits
 
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS SummitPerforming serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS SummitAmazon Web Services
 
IOT311_Customer Stories of Things, Cloud, and Analytics on AWS
IOT311_Customer Stories of Things, Cloud, and Analytics on AWSIOT311_Customer Stories of Things, Cloud, and Analytics on AWS
IOT311_Customer Stories of Things, Cloud, and Analytics on AWSAmazon Web Services
 
Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019
Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019
Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019Amazon Web Services
 

Mais procurados (20)

Breaking the Monolith Road to Containers
Breaking the Monolith Road to ContainersBreaking the Monolith Road to Containers
Breaking the Monolith Road to Containers
 
The evolution of automated reasoning technology at AWS - SEP201 - AWS re:Info...
The evolution of automated reasoning technology at AWS - SEP201 - AWS re:Info...The evolution of automated reasoning technology at AWS - SEP201 - AWS re:Info...
The evolution of automated reasoning technology at AWS - SEP201 - AWS re:Info...
 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using Firecracker
 
Kubernetes on AWS with Amazon EKS - MAD301 - New York AWS Summit
Kubernetes on AWS with Amazon EKS - MAD301 - New York AWS SummitKubernetes on AWS with Amazon EKS - MAD301 - New York AWS Summit
Kubernetes on AWS with Amazon EKS - MAD301 - New York AWS Summit
 
Design with ops in mind | AWS Summit Tel Aviv 2019
Design with ops in mind | AWS Summit Tel Aviv 2019Design with ops in mind | AWS Summit Tel Aviv 2019
Design with ops in mind | AWS Summit Tel Aviv 2019
 
The IoT Offering Explained in Plain English - IOT201 - re:Invent 2017
The IoT Offering Explained in Plain English - IOT201 - re:Invent 2017The IoT Offering Explained in Plain English - IOT201 - re:Invent 2017
The IoT Offering Explained in Plain English - IOT201 - re:Invent 2017
 
Twelve-factor serverless applications - MAD302 - Santa Clara AWS Summit
Twelve-factor serverless applications - MAD302 - Santa Clara AWS SummitTwelve-factor serverless applications - MAD302 - Santa Clara AWS Summit
Twelve-factor serverless applications - MAD302 - Santa Clara AWS Summit
 
Blur the boundaries between your on-premises to AWS cloud by embracing VMWare...
Blur the boundaries between your on-premises to AWS cloud by embracing VMWare...Blur the boundaries between your on-premises to AWS cloud by embracing VMWare...
Blur the boundaries between your on-premises to AWS cloud by embracing VMWare...
 
ENT227_IoT + Cloud enables Enterprise Digital Transformation
ENT227_IoT + Cloud enables Enterprise Digital TransformationENT227_IoT + Cloud enables Enterprise Digital Transformation
ENT227_IoT + Cloud enables Enterprise Digital Transformation
 
Introducing Amazon SageMaker - AWS Online Tech Talks
Introducing Amazon SageMaker - AWS Online Tech TalksIntroducing Amazon SageMaker - AWS Online Tech Talks
Introducing Amazon SageMaker - AWS Online Tech Talks
 
NEW LAUNCH! Amazon FreeRTOS: IoT Operating System for Microcontrollers - IOT2...
NEW LAUNCH! Amazon FreeRTOS: IoT Operating System for Microcontrollers - IOT2...NEW LAUNCH! Amazon FreeRTOS: IoT Operating System for Microcontrollers - IOT2...
NEW LAUNCH! Amazon FreeRTOS: IoT Operating System for Microcontrollers - IOT2...
 
AWS IoT를 이용한 퍼스널 푸드 컴퓨터 개발사례::한광희::AWS Summit Seoul 2018
AWS IoT를 이용한 퍼스널 푸드 컴퓨터 개발사례::한광희::AWS Summit Seoul 2018AWS IoT를 이용한 퍼스널 푸드 컴퓨터 개발사례::한광희::AWS Summit Seoul 2018
AWS IoT를 이용한 퍼스널 푸드 컴퓨터 개발사례::한광희::AWS Summit Seoul 2018
 
From Code to a running container | AWS Summit Tel Aviv 2019
From Code to a running container | AWS Summit Tel Aviv 2019From Code to a running container | AWS Summit Tel Aviv 2019
From Code to a running container | AWS Summit Tel Aviv 2019
 
Deriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML ArchitecturesDeriving Value with Next Gen Analytics and ML Architectures
Deriving Value with Next Gen Analytics and ML Architectures
 
AWS 無伺服器開發工作坊_Matt.pdf
AWS 無伺服器開發工作坊_Matt.pdfAWS 無伺服器開發工作坊_Matt.pdf
AWS 無伺服器開發工作坊_Matt.pdf
 
Architecting Security & Governance Across Your AWS Landing Zone
Architecting Security & Governance Across Your AWS Landing ZoneArchitecting Security & Governance Across Your AWS Landing Zone
Architecting Security & Governance Across Your AWS Landing Zone
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
 
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS SummitPerforming serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
Performing serverless analytics in AWS Glue - ADB202 - Chicago AWS Summit
 
IOT311_Customer Stories of Things, Cloud, and Analytics on AWS
IOT311_Customer Stories of Things, Cloud, and Analytics on AWSIOT311_Customer Stories of Things, Cloud, and Analytics on AWS
IOT311_Customer Stories of Things, Cloud, and Analytics on AWS
 
Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019
Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019
Transform with Cloud to drive your Future | AWS Summit Tel Aviv 2019
 

Semelhante a Machine Learning with Kubernetes- AWS Container Day 2019 Barcelona

Machine learning using Kubernetes
Machine learning using KubernetesMachine learning using Kubernetes
Machine learning using KubernetesArun Gupta
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesArun Gupta
 
Sviluppa, addestra e distribuisci modelli di machine learning.pdf
Sviluppa, addestra e distribuisci modelli di machine learning.pdfSviluppa, addestra e distribuisci modelli di machine learning.pdf
Sviluppa, addestra e distribuisci modelli di machine learning.pdfAmazon Web Services
 
Build-Train-Deploy-Machine-Learning-Models-at-Any-Scale
Build-Train-Deploy-Machine-Learning-Models-at-Any-ScaleBuild-Train-Deploy-Machine-Learning-Models-at-Any-Scale
Build-Train-Deploy-Machine-Learning-Models-at-Any-ScaleAmazon Web Services
 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Arun Gupta
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Julien SIMON
 
Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...
Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...
Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...Amazon Web Services
 
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMakerDeep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMakerAmazon Web Services
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Julien SIMON
 
[NEW LAUNCH] Introducing AWS Deep Learning Containers
[NEW LAUNCH] Introducing AWS Deep Learning Containers[NEW LAUNCH] Introducing AWS Deep Learning Containers
[NEW LAUNCH] Introducing AWS Deep Learning ContainersAmazon Web Services
 
Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...
Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...
Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...Amazon Web Services
 
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)Julien SIMON
 
Scale - Amazon SageMaker Deep Dive for Builders
Scale - Amazon SageMaker Deep Dive for BuildersScale - Amazon SageMaker Deep Dive for Builders
Scale - Amazon SageMaker Deep Dive for BuildersAmazon Web Services
 
Amazon AI/ML Overview
Amazon AI/ML OverviewAmazon AI/ML Overview
Amazon AI/ML OverviewBESPIN GLOBAL
 
Artifical Intelligence and Machine Learning 201, AWS Federal Pop-Up Loft
Artifical Intelligence and Machine Learning 201, AWS Federal Pop-Up LoftArtifical Intelligence and Machine Learning 201, AWS Federal Pop-Up Loft
Artifical Intelligence and Machine Learning 201, AWS Federal Pop-Up LoftAmazon Web Services
 
[AWS Dev Day] 인공지능 / 기계 학습 | Intel on AWS, AI/ML Service 성능 향상을 위한 협력 모델 - 서...
[AWS Dev Day] 인공지능 / 기계 학습 | Intel on AWS,  AI/ML Service 성능 향상을 위한 협력 모델 - 서...[AWS Dev Day] 인공지능 / 기계 학습 | Intel on AWS,  AI/ML Service 성능 향상을 위한 협력 모델 - 서...
[AWS Dev Day] 인공지능 / 기계 학습 | Intel on AWS, AI/ML Service 성능 향상을 위한 협력 모델 - 서...Amazon Web Services Korea
 
Machine learning for developers & data scientists with Amazon SageMaker - AIM...
Machine learning for developers & data scientists with Amazon SageMaker - AIM...Machine learning for developers & data scientists with Amazon SageMaker - AIM...
Machine learning for developers & data scientists with Amazon SageMaker - AIM...Amazon Web Services
 
AI/ML Week: Strengthen Cybersecurity
AI/ML Week: Strengthen CybersecurityAI/ML Week: Strengthen Cybersecurity
AI/ML Week: Strengthen CybersecurityAmazon Web Services
 
Getting Started with AIML Using Amazon Sagemaker_AWSPSSummit_Singapore
Getting Started with AIML Using Amazon Sagemaker_AWSPSSummit_SingaporeGetting Started with AIML Using Amazon Sagemaker_AWSPSSummit_Singapore
Getting Started with AIML Using Amazon Sagemaker_AWSPSSummit_SingaporeAmazon Web Services
 

Semelhante a Machine Learning with Kubernetes- AWS Container Day 2019 Barcelona (20)

Machine learning using Kubernetes
Machine learning using KubernetesMachine learning using Kubernetes
Machine learning using Kubernetes
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and Kubernetes
 
Sviluppa, addestra e distribuisci modelli di machine learning.pdf
Sviluppa, addestra e distribuisci modelli di machine learning.pdfSviluppa, addestra e distribuisci modelli di machine learning.pdf
Sviluppa, addestra e distribuisci modelli di machine learning.pdf
 
Build-Train-Deploy-Machine-Learning-Models-at-Any-Scale
Build-Train-Deploy-Machine-Learning-Models-at-Any-ScaleBuild-Train-Deploy-Machine-Learning-Models-at-Any-Scale
Build-Train-Deploy-Machine-Learning-Models-at-Any-Scale
 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019
 
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
 
Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...
Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...
Amazon SageMaker sviluppa, addestra e distribuisci modelli di Machine Learnin...
 
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMakerDeep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
Deep Learning con TensorFlow and Apache MXNet su Amazon SageMaker
 
Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)Build Machine Learning Models with Amazon SageMaker (April 2019)
Build Machine Learning Models with Amazon SageMaker (April 2019)
 
[NEW LAUNCH] Introducing AWS Deep Learning Containers
[NEW LAUNCH] Introducing AWS Deep Learning Containers[NEW LAUNCH] Introducing AWS Deep Learning Containers
[NEW LAUNCH] Introducing AWS Deep Learning Containers
 
Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...
Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...
Art of the possible- Leveraging Machine Learning to Improve Forecasting and G...
 
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
 
Scale - Amazon SageMaker Deep Dive for Builders
Scale - Amazon SageMaker Deep Dive for BuildersScale - Amazon SageMaker Deep Dive for Builders
Scale - Amazon SageMaker Deep Dive for Builders
 
Amazon AI/ML Overview
Amazon AI/ML OverviewAmazon AI/ML Overview
Amazon AI/ML Overview
 
Data Lake na área da saúde- AWS
Data Lake na área da saúde- AWSData Lake na área da saúde- AWS
Data Lake na área da saúde- AWS
 
Artifical Intelligence and Machine Learning 201, AWS Federal Pop-Up Loft
Artifical Intelligence and Machine Learning 201, AWS Federal Pop-Up LoftArtifical Intelligence and Machine Learning 201, AWS Federal Pop-Up Loft
Artifical Intelligence and Machine Learning 201, AWS Federal Pop-Up Loft
 
[AWS Dev Day] 인공지능 / 기계 학습 | Intel on AWS, AI/ML Service 성능 향상을 위한 협력 모델 - 서...
[AWS Dev Day] 인공지능 / 기계 학습 | Intel on AWS,  AI/ML Service 성능 향상을 위한 협력 모델 - 서...[AWS Dev Day] 인공지능 / 기계 학습 | Intel on AWS,  AI/ML Service 성능 향상을 위한 협력 모델 - 서...
[AWS Dev Day] 인공지능 / 기계 학습 | Intel on AWS, AI/ML Service 성능 향상을 위한 협력 모델 - 서...
 
Machine learning for developers & data scientists with Amazon SageMaker - AIM...
Machine learning for developers & data scientists with Amazon SageMaker - AIM...Machine learning for developers & data scientists with Amazon SageMaker - AIM...
Machine learning for developers & data scientists with Amazon SageMaker - AIM...
 
AI/ML Week: Strengthen Cybersecurity
AI/ML Week: Strengthen CybersecurityAI/ML Week: Strengthen Cybersecurity
AI/ML Week: Strengthen Cybersecurity
 
Getting Started with AIML Using Amazon Sagemaker_AWSPSSummit_Singapore
Getting Started with AIML Using Amazon Sagemaker_AWSPSSummit_SingaporeGetting Started with AIML Using Amazon Sagemaker_AWSPSSummit_Singapore
Getting Started with AIML Using Amazon Sagemaker_AWSPSSummit_Singapore
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Machine Learning with Kubernetes- AWS Container Day 2019 Barcelona

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning with Kubernetes Yaniv Donenfeld AI/ML Solutions Container Services, AWS Mike Crute Sr. Engineer Container Services, AWS
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Centerpiece for digital transformation
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference FRAMEWORKS INTERFACES INFRASTRUCTURE Inferentia EC2 G4 The Amazon ML stack: Broadest & deepest set of capabilities
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment Hosting ML Frameworks + Infrastructure EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference FRAMEWORKS INTERFACES INFRASTRUCTURE Inferentia EC2 G4 The Amazon ML stack: Broadest & deepest set of capabilities
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N V I D E O F O R E C A S TT E X T R A C T P E R S O N A L I Z E VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS ML Services Amazon SageMaker Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment Hosting ML Frameworks + Infrastructure EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference FRAMEWORKS INTERFACES INFRASTRUCTURE Inferentia EC2 G4 The Amazon ML stack: Broadest & deepest set of capabilities
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Amazon ML stack: Broadest & deepest set of capabilities R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N V I D E O F O R E C A S TT E X T R A C T P E R S O N A L I Z E VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS Amazon SageMaker Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment Hosting EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference FRAMEWORKS INTERFACES Inferentia EC2 G4 INFRASTRUCTURE
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “A little less conversation, a little more action, please” M A C H I N E L E A R N I N G S T O R A G E Amazon Redshift + Redshift Spectrum Amazon QuickSight Amazon EMR Hadoop, Spark, Presto, Pig, Hive…19 total Amazon Athena Amazon Kinesis Amazon Elasticsearch Service AWS Glue A N A L Y T I C S Amazon S3 Standard-IA Amazon S3 Standard Amazon S3 One Zone-IA Amazon Glacier Amazon S3 Intelligent- Tiering N E W Amazon EBS Amazon S3 Glacier Deep Archive N E W R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N V I D E O F O R E C A S TT E X T R A C T P E R S O N A L I Z E V I S I O N S P E E C H L A N G U A G E C H A T B O T S F O R E C A S T I N G R E C O M M E N D A T I O N S Amazon SageMaker Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment Hosting E C 2 P3 & P3 d n E C 2 C 5 FPG A s G r e e n g r a s s E la s t i c i n f e r e n c e F R A M E W O R K S I N T E R F A C E S I n f e r e n t i a E C 2 G 4 I N F R A S T R U C T U R E
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N V I D E O F O R E C A S TT E X T R A C T P E R S O N A L I Z E VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS Amazon SageMaker Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment Hosting EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference FRAMEWORKS INTERFACES INFRASTRUCTURE Inferentia EC2 G4 Machine Learning using Kubernetes
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ML Frameworks + Infrastructure EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference FRAMEWORKS INTERFACES INFRASTRUCTURE Inferentia EC2 G4 Machine Learning using Kubernetes
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Machine Learning on Kubernetes? Composability Portability Scalability O N - P R E M I S E S C L O U D http://www.shutterstock.com/gallery-635827p1.html
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Machine Learning Development Workflow ML Workflow
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Jupyter? Jupyter provides an easy on-ramp to build, train, and deploy ML models Jupyter Notebook allows you to create and share documents that contain live code, equations, visualizations, and narrative text Uses include: • data cleaning and transformation • numerical simulation • statistical modeling • data visualization • machine learning • and much more
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Machine Learning Development Workflow
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ML on K8s—without KubeFlow Credits: @aronchik
  • 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ML on K8s—with KubeFlow Credits: @aronchik
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in KubeFlow?
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kubeflow Pipelines • Invoke EMR/Spark jobs for preprocessing directly as a pipeline step • Invoke SageMaker jobs to utilize the SageMaker framework from within Kubeflow
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Advantages of KubeFlow on AWS EKS cluster provision with External traffic with to manage Lustre file system Centralized and unified K8s logs in TLS and Auth with and for your K8s API server endpoint Detect GPU instance and install
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. TensorFlow Open source library to develop and train ML models Created by Google Brain team Can run on desktop, servers, mobiles, edge devices
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS is the platform of choice to run TensorFlow of all TensorFlow workloads in the cloud runs on AWS
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Train twice as fast with TensorFlow 65% Scaling efficiency with 256 GPUs S T O C K T E N S O R F L O W 100% Scaling efficiency with 256 GPUS A W S - O P T I M Z E D T E N S O R F L O W
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Autonomous Vehicle Sensor Technology © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. LIDAR Camera GPS/HD Maps Telematics Control Unit Accelerometer/Gyroscope DSRC Radar Infrared Ultrasonic
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. SAE LEVEL NAME CONTROL MONITORING VIGILANCE FALLBACK HANDOFF OPERATING CONDITIONS 0 Unassisted Human 1 Driver Assist Human and AD System Human Human Some 2 Partial Automation AD System Human Human (0 warning) Some 3 Conditional Automation AD System AD System Human (adequate warning, assumed 15 seconds) Some 4 High Automation AD System AD System AD System Some 5 Full Automation AD System AD System AD System All © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. SAE LEVEL NAME CONTROL MONITORING VIGILANCE FALLBACK HANDOFF OPERATING CONDITIONS 0 Unassisted Human 1 Driver Assist Human and AD System Human Human Some 2 Partial Automation AD System Human Human (0 warning) Some 3 Conditional Automation AD System AD System Human (adequate warning, assumed 15 seconds) Some 4 High Automation AD System AD System AD System Some 5 Full Automation AD System AD System AD System All
  • 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AV Workload Characteristics • Distributed, Multi-GPU, Multi-Node Training jobs • ML Practitioners don’t know/want to deal with infrastructure • Large datasets on S3 require copying to a staging area on the cluster • Data should be shared across nodes and re-used across jobs • Distributed training code can be hard • Capacity provisioning and smart autoscaling across jobs can be complex
  • 29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AV Workload Characteristics • Distributed, Multi-GPU, Multi-Node Training jobs • ML Practitioners don’t know/want to deal with infrastructure Kubeflow on AWS provides an abstraction to setup your cluster as part of the Kubeflow setup process
  • 30. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AV Workload Characteristics • Distributed, Multi-GPU, Multi-Node Training jobs • Large datasets on S3 require copying to a staging area on the cluster • Data should be shared across nodes and re-used across jobs • New CSI driver for EFS and FSx Lustre – dynamically provision a file system using PVC. • FSx Lustre can shadow an S3 Bucket – no need to copy
  • 31. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AV Workload Characteristics • Distributed, Multi-GPU, Multi-Node Training jobs • Distributed training code can be hard • Use Horovod, a Distributed Training framework for TensorFlow, Keras, PyTorch, and MXNet
  • 32. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AV Workload Characteristics • Distributed, Multi-GPU, Multi-Node Training jobs • Capacity provisioning and smart autoscaling across jobs can be complex • Introducing updates to Escalator - a cluster auto scaling component that is tuned for batch/ML workloads, and EC2 GPU instance autoscaling
  • 33. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DEMO
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 35. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AV Workload Characteristics • Large-scale simulation workloads
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. - 200,000 400,000 600,000 800,000 1,000,000 1,200,000 Koala Genomics AV Safety Concurrent CPUs
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. - 50,000,000 100,000,000 150,000,000 200,000,000 250,000,000 300,000,000 350,000,000 Koala Genomics AV Safety Total Core Hours / Year
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Rank / Name Rmax / Rpeak (Petaflops) 1. Summit 143.500 / 200.795 2. Sierra 94.640 / 125.436 3. Sunway Tahihu Light 93.015 / 125.436 4. Tianhe-2A 61.445 / 100.679 5. Piz Daint 21.230 / 27.154 6. Trinity 20.159 / 41.461 7. AI Bridging Cloud Infrastructure 19.880 / 32.577 8. SuperMUCNG 19.477 / 26.874 9. Titan 17.590 / 27.113 10. Sequoia 17.173 / 20.133 TOP500 – Top 10 Supercomputers in Nov 2018
  • 40. We’re helping our AV customers run at Supercomputer Scale, targeting the equivalent of one of the Top 10 largest supercomputers in the world.
  • 41. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Challenges in setting up containers for ML
  • 42. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS deep learning containers KEY FEATURES Customizable container images Support for TensorFlow, Apache MXNet Single and multi-node training and inference Pre-packaged Docker container images fully configured and validated Best performance and scalability without tuning Works with Amazon EKS, Amazon ECS, and Amazon EC2
  • 43. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 16 container images Training Inference GPU CPU Python 2.7 Python 3.6
  • 44. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Getting Started https://github.com/aws-samples/machine-learning-using-k8s/