SlideShare uma empresa Scribd logo
1 de 29
Real-world High Performance & High Throughput
Computing on AWS
Dr Matthew Berryman
Managing Director, Across the Cloud
Chair, High Performance Steering Committee, University of Wollongong
Background slides by Adrian White, Manager, APAC Research & Technical Computing, AWS
5/12/2017
Tools should fit your workflows
Collect WorkflowsEvents Integrate Discover
Validate &
Share
Real-time
Batch
Amazon
Kinesis Amazon EMR
Streaming
Amazon S3
AWS Lambda
AWS Batch
Amazon S3 Amazon S3
AWS CLI &
SDKs
Amazon
EMR
Amazon
Redshift
Amazon
Athena
HPC cluster
Amazon API
Gateway
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
A cluster in the cloud is an ephemeral tool
Amazon S3
Source data IN
Data product OUT
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
Compute in the cloud is flexible
M4
General
purpose
Compute
optimized
C4
C3
Storage and I/O
optimized
I3
G2
GPU or FPGA
enabled
Memory
optimized
D2
M3
X1
P2
F1
R4
R3
C5
I2 HS1
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
Instances types within a family
Model vCPUs Memory (GiB) Networking
Performance
r4.xlarge 2 15.25 Up to 10 Gbps
r4.xlarge 4 30.5 Up to 10 Gbps
r4.2xlarge 8 61 Up to 10 Gbps
r4.4xlarge 16 122 Up to 10 Gbps
r4.8xlarge 32 244 10 Gbps
r4.16xlarge 64 488 20 Gbps
R4
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
Clusters in the cloud are fit for purpose
Amazon S3
R4
P2 P2 P2
P2 P2 P2 R4
C5
C5
C5 C5
C5 C5
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
Clusters can scale and are elastic
W = 1, C =1 W = n, C = n W = 0, C ~ 0
t
c c
t
c
t
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
Everyone consumes S3
Collect Share
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
Keep storage simple – S3 and POSIX cache
M M M
S S S S
Mgt
N.B. Data lifecycle is
required
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
HPC+HTC Tools
A few HPC and HTC tools on AWS
CfnCluster AWS Batch Amazon EMR+
CfnCluster is
provided by AWS to
quickly provision
configurable HPC
and HTC cluster
environments
Alces Flight is available
in the AWS Marketplace
and bundles 1000+
commonly used
scientific applications
https://aws.amazon.com
/marketplace/
AWS Batch provides
compute resources via
Docker containers
with user-definable
queues and an
optimised job
scheduler
Amazon EMR
provides a managed
Hadoop framework
supporting Apache
Spark, HBase, Presto,
and Flink on Amazon
EC2 and EC2 Spot
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
A closer look at Alces Flight
1000+ popular scientific applications
• Multiple versions, complete with libraries and
various compiler optimizations, ready to run
• Supports Docker and Singularity
• Slurm default scheduler (also PBS Pro, SGE
etc)
Available via the AWS Marketplace
http://alces-flight.com/ for more information
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
R4
…
OpenDroneMap on Alces Flight architecture
r4.8xlarge
32 vCPUs
244 GiB RAM
10 Gbps
R4
R4
…
OpenDroneMap on Alces Flight architecture
Demo: Push-button HPC + aerial imagery
processing on Alces Flight HPC
How’d that magic happen?
So, how much did this cost?
On-demand / hr Spot / hr Running total
Login node
(r4.2xlarge)
$1.91 $1.91
Compute nodes
(r4.8xlarge x 4)
$8.52 $1.20
(85% saving)
$3.60
Shared storage
(1TB general purpose SSD
via NFS)
$0.17 $0.17
Data transfer + S3
(egress)
$0.07 $0.07
Processing time: 3 hours
GRAND TOTAL: $5.75
Evolving the compute paradigm
Physical Virtualization Containerization Serverless
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
Simplifying and cutting costs
further: OpenDroneMap-ECS
Copy
Run
Demo: OpenDroneMap-ECS
AWS Batch
Take a look at https://www.nextflow.io/
“HPC” on Lambda?
def my_function(b):
x = np.random.normal(0, b, 1024)
A = np.random.normal(0, b, (1024, 1024))
return np.dot(A, x)
pwex = pywren.default_executor()
res = pwex.map(my_function, np.linspace(0.1, 100, 1000))
PyWren.io
PyWren lets you run your existing
python code at massive scale via
AWS Lambda
CSIRO have built GT-
Scan2 for CRISPR/Cas9
analysis on AWS Lambda
Before you go home…
Do two things!
+
Register and enroll in the
AWS Research Cloud Program
https://aws.amazon.com/rcp
Launch your own personal cluster
Using Alces Flight
http://alces-flight.com/community
1. 2.
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
More Information
• High Performance Computing on AWS
(whitepapers, customer examples, tech overviews)
• AWS Compute Blog
• Research & Technical Computing on AWS
• AWS Research Cloud Program
HPC + HTC Tools
CfnCluster, AWS Batch, Alces Flight
Pywren: Terraflops and microservices
© 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
Source code
• IoT push button HPC:
https://github.com/AcrossTheCloud/iot-button-hpc
• Alces Flight Open Drone Map:
https://github.com/AcrossTheCloud/alces-flight-odm
• OpenDroneMap-ECS:
https://github.com/opendronemap/opendronemap-ecs
• The slowest Internet-connected computer:
https://github.com/matthewberryman/brunsviga (live at
brunsviga.io )
Thank you!

Mais conteúdo relacionado

Mais procurados

ARC207_Monitoring Performance of Enterprise Applications on AWS
ARC207_Monitoring Performance of Enterprise Applications on AWSARC207_Monitoring Performance of Enterprise Applications on AWS
ARC207_Monitoring Performance of Enterprise Applications on AWSAmazon Web Services
 
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud DataGPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud DataAmazon Web Services
 
Storage and Backup on AWS - Hebrew Webinar November 2017
Storage and Backup on AWS - Hebrew Webinar November 2017Storage and Backup on AWS - Hebrew Webinar November 2017
Storage and Backup on AWS - Hebrew Webinar November 2017Amazon Web Services
 
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...Amazon Web Services
 
GPSTEC305-Machine Learning in Capital Markets
GPSTEC305-Machine Learning in Capital MarketsGPSTEC305-Machine Learning in Capital Markets
GPSTEC305-Machine Learning in Capital MarketsAmazon Web Services
 
AWS Commercial Management and Cost Optimisation - Dec 2017
AWS Commercial Management and Cost Optimisation - Dec 2017AWS Commercial Management and Cost Optimisation - Dec 2017
AWS Commercial Management and Cost Optimisation - Dec 2017Amazon Web Services
 
DAT339_Replicate, Analyze, and Visualize Datasets Using AWS Database Migratio...
DAT339_Replicate, Analyze, and Visualize Datasets Using AWS Database Migratio...DAT339_Replicate, Analyze, and Visualize Datasets Using AWS Database Migratio...
DAT339_Replicate, Analyze, and Visualize Datasets Using AWS Database Migratio...Amazon Web Services
 
Migrating Your Databases to AWS – Tools and Services (Level 100)
Migrating Your Databases to AWS – Tools and Services (Level 100)Migrating Your Databases to AWS – Tools and Services (Level 100)
Migrating Your Databases to AWS – Tools and Services (Level 100)Amazon Web Services
 
ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...Amazon Web Services
 
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...Amazon Web Services
 
GPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWS
GPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWSGPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWS
GPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWSAmazon Web Services
 
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017Amazon Web Services
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...Amazon Web Services
 
Deploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
Deploying Business Analytics at Enterprise Scale - AWS Online Tech TalksDeploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
Deploying Business Analytics at Enterprise Scale - AWS Online Tech TalksAmazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
DAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the CloudDAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the CloudAmazon Web Services
 
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Amazon Web Services
 
DEV337_Deploy a Data Lake with AWS CloudFormation
DEV337_Deploy a Data Lake with AWS CloudFormationDEV337_Deploy a Data Lake with AWS CloudFormation
DEV337_Deploy a Data Lake with AWS CloudFormationAmazon Web Services
 
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...Amazon Web Services
 

Mais procurados (20)

ARC207_Monitoring Performance of Enterprise Applications on AWS
ARC207_Monitoring Performance of Enterprise Applications on AWSARC207_Monitoring Performance of Enterprise Applications on AWS
ARC207_Monitoring Performance of Enterprise Applications on AWS
 
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud DataGPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
GPSTEC315_GPS Optimizing Tips Amazon Redshift for Cloud Data
 
Storage and Backup on AWS - Hebrew Webinar November 2017
Storage and Backup on AWS - Hebrew Webinar November 2017Storage and Backup on AWS - Hebrew Webinar November 2017
Storage and Backup on AWS - Hebrew Webinar November 2017
 
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
 
GPSTEC305-Machine Learning in Capital Markets
GPSTEC305-Machine Learning in Capital MarketsGPSTEC305-Machine Learning in Capital Markets
GPSTEC305-Machine Learning in Capital Markets
 
AWS Commercial Management and Cost Optimisation - Dec 2017
AWS Commercial Management and Cost Optimisation - Dec 2017AWS Commercial Management and Cost Optimisation - Dec 2017
AWS Commercial Management and Cost Optimisation - Dec 2017
 
DAT339_Replicate, Analyze, and Visualize Datasets Using AWS Database Migratio...
DAT339_Replicate, Analyze, and Visualize Datasets Using AWS Database Migratio...DAT339_Replicate, Analyze, and Visualize Datasets Using AWS Database Migratio...
DAT339_Replicate, Analyze, and Visualize Datasets Using AWS Database Migratio...
 
Migrating Your Databases to AWS – Tools and Services (Level 100)
Migrating Your Databases to AWS – Tools and Services (Level 100)Migrating Your Databases to AWS – Tools and Services (Level 100)
Migrating Your Databases to AWS – Tools and Services (Level 100)
 
ARC213_Open Source at AWS
ARC213_Open Source at AWSARC213_Open Source at AWS
ARC213_Open Source at AWS
 
ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...
 
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
ATC303-Cache Me If You Can Minimizing Latency While Optimizing Cost Through A...
 
GPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWS
GPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWSGPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWS
GPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWS
 
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
 
Deploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
Deploying Business Analytics at Enterprise Scale - AWS Online Tech TalksDeploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
Deploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
DAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the CloudDAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the Cloud
 
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
 
DEV337_Deploy a Data Lake with AWS CloudFormation
DEV337_Deploy a Data Lake with AWS CloudFormationDEV337_Deploy a Data Lake with AWS CloudFormation
DEV337_Deploy a Data Lake with AWS CloudFormation
 
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
 

Semelhante a Real world High Performance & High Throughput Computing on AWS

The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...Amazon Web Services
 
Real-world High Performance & High Throughput Computing on AWS - AWS PS Summi...
Real-world High Performance & High Throughput Computing on AWS - AWS PS Summi...Real-world High Performance & High Throughput Computing on AWS - AWS PS Summi...
Real-world High Performance & High Throughput Computing on AWS - AWS PS Summi...Amazon Web Services
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
AWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWSAWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWSAmazon Web Services
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWSAmazon Web Services
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSAmazon Web Services
 
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWSETCenter
 
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAccelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAmazon Web Services
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWSDanilo Poccia
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
2017 09-27 big data- how to securely implement and automate on aws (1)
2017 09-27 big data- how to securely implement and automate on aws (1)2017 09-27 big data- how to securely implement and automate on aws (1)
2017 09-27 big data- how to securely implement and automate on aws (1)REAN Cloud
 
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Amazon Web Services
 
Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017Amazon Web Services
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Amazon Web Services
 
High-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-SimulationHigh-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-SimulationAmazon Web Services
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAmazon Web Services
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRAmazon Web Services
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesVladimir Simek
 
(CMP405) Containerizing Video: The Next Gen Video Transcoding Pipeline
(CMP405) Containerizing Video: The Next Gen Video Transcoding Pipeline(CMP405) Containerizing Video: The Next Gen Video Transcoding Pipeline
(CMP405) Containerizing Video: The Next Gen Video Transcoding PipelineAmazon Web Services
 

Semelhante a Real world High Performance & High Throughput Computing on AWS (20)

The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
The Future of Research Computing on AWS - AWS Public Sector Summit Singapore ...
 
Real-world High Performance & High Throughput Computing on AWS - AWS PS Summi...
Real-world High Performance & High Throughput Computing on AWS - AWS PS Summi...Real-world High Performance & High Throughput Computing on AWS - AWS PS Summi...
Real-world High Performance & High Throughput Computing on AWS - AWS PS Summi...
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
AWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWSAWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWS
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWS
 
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
 
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAccelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWS
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
4K Media Workflows on AWS
4K Media Workflows on AWS4K Media Workflows on AWS
4K Media Workflows on AWS
 
2017 09-27 big data- how to securely implement and automate on aws (1)
2017 09-27 big data- how to securely implement and automate on aws (1)2017 09-27 big data- how to securely implement and automate on aws (1)
2017 09-27 big data- how to securely implement and automate on aws (1)
 
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
Architectures for HPC/HTC Workloads on AWS - CMP306 - re:Invent 2017
 
Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017Batch Processing with Containers on AWS - CON304 - re:Invent 2017
Batch Processing with Containers on AWS - CON304 - re:Invent 2017
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
 
High-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-SimulationHigh-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-Simulation
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
(CMP405) Containerizing Video: The Next Gen Video Transcoding Pipeline
(CMP405) Containerizing Video: The Next Gen Video Transcoding Pipeline(CMP405) Containerizing Video: The Next Gen Video Transcoding Pipeline
(CMP405) Containerizing Video: The Next Gen Video Transcoding Pipeline
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Real world High Performance & High Throughput Computing on AWS

  • 1. Real-world High Performance & High Throughput Computing on AWS Dr Matthew Berryman Managing Director, Across the Cloud Chair, High Performance Steering Committee, University of Wollongong Background slides by Adrian White, Manager, APAC Research & Technical Computing, AWS 5/12/2017
  • 2. Tools should fit your workflows Collect WorkflowsEvents Integrate Discover Validate & Share Real-time Batch Amazon Kinesis Amazon EMR Streaming Amazon S3 AWS Lambda AWS Batch Amazon S3 Amazon S3 AWS CLI & SDKs Amazon EMR Amazon Redshift Amazon Athena HPC cluster Amazon API Gateway © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 3. A cluster in the cloud is an ephemeral tool Amazon S3 Source data IN Data product OUT © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 4. Compute in the cloud is flexible M4 General purpose Compute optimized C4 C3 Storage and I/O optimized I3 G2 GPU or FPGA enabled Memory optimized D2 M3 X1 P2 F1 R4 R3 C5 I2 HS1 © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 5. Instances types within a family Model vCPUs Memory (GiB) Networking Performance r4.xlarge 2 15.25 Up to 10 Gbps r4.xlarge 4 30.5 Up to 10 Gbps r4.2xlarge 8 61 Up to 10 Gbps r4.4xlarge 16 122 Up to 10 Gbps r4.8xlarge 32 244 10 Gbps r4.16xlarge 64 488 20 Gbps R4 © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 6. Clusters in the cloud are fit for purpose Amazon S3 R4 P2 P2 P2 P2 P2 P2 R4 C5 C5 C5 C5 C5 C5 © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 7. Clusters can scale and are elastic W = 1, C =1 W = n, C = n W = 0, C ~ 0 t c c t c t © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 8. Everyone consumes S3 Collect Share © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 9. Keep storage simple – S3 and POSIX cache M M M S S S S Mgt N.B. Data lifecycle is required © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 11. A few HPC and HTC tools on AWS CfnCluster AWS Batch Amazon EMR+ CfnCluster is provided by AWS to quickly provision configurable HPC and HTC cluster environments Alces Flight is available in the AWS Marketplace and bundles 1000+ commonly used scientific applications https://aws.amazon.com /marketplace/ AWS Batch provides compute resources via Docker containers with user-definable queues and an optimised job scheduler Amazon EMR provides a managed Hadoop framework supporting Apache Spark, HBase, Presto, and Flink on Amazon EC2 and EC2 Spot © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 12. A closer look at Alces Flight 1000+ popular scientific applications • Multiple versions, complete with libraries and various compiler optimizations, ready to run • Supports Docker and Singularity • Slurm default scheduler (also PBS Pro, SGE etc) Available via the AWS Marketplace http://alces-flight.com/ for more information © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 13. R4 … OpenDroneMap on Alces Flight architecture r4.8xlarge 32 vCPUs 244 GiB RAM 10 Gbps
  • 14. R4 R4 … OpenDroneMap on Alces Flight architecture
  • 15. Demo: Push-button HPC + aerial imagery processing on Alces Flight HPC
  • 17. So, how much did this cost? On-demand / hr Spot / hr Running total Login node (r4.2xlarge) $1.91 $1.91 Compute nodes (r4.8xlarge x 4) $8.52 $1.20 (85% saving) $3.60 Shared storage (1TB general purpose SSD via NFS) $0.17 $0.17 Data transfer + S3 (egress) $0.07 $0.07 Processing time: 3 hours GRAND TOTAL: $5.75
  • 18. Evolving the compute paradigm Physical Virtualization Containerization Serverless © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 19. Simplifying and cutting costs further: OpenDroneMap-ECS
  • 20. Copy
  • 21. Run
  • 23. AWS Batch Take a look at https://www.nextflow.io/
  • 24. “HPC” on Lambda? def my_function(b): x = np.random.normal(0, b, 1024) A = np.random.normal(0, b, (1024, 1024)) return np.dot(A, x) pwex = pywren.default_executor() res = pwex.map(my_function, np.linspace(0.1, 100, 1000)) PyWren.io PyWren lets you run your existing python code at massive scale via AWS Lambda CSIRO have built GT- Scan2 for CRISPR/Cas9 analysis on AWS Lambda
  • 25. Before you go home…
  • 26. Do two things! + Register and enroll in the AWS Research Cloud Program https://aws.amazon.com/rcp Launch your own personal cluster Using Alces Flight http://alces-flight.com/community 1. 2. © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 27. More Information • High Performance Computing on AWS (whitepapers, customer examples, tech overviews) • AWS Compute Blog • Research & Technical Computing on AWS • AWS Research Cloud Program HPC + HTC Tools CfnCluster, AWS Batch, Alces Flight Pywren: Terraflops and microservices © 2017, Amazon Web Services, Inc. or its Affiliates, All rights reserved.
  • 28. Source code • IoT push button HPC: https://github.com/AcrossTheCloud/iot-button-hpc • Alces Flight Open Drone Map: https://github.com/AcrossTheCloud/alces-flight-odm • OpenDroneMap-ECS: https://github.com/opendronemap/opendronemap-ecs • The slowest Internet-connected computer: https://github.com/matthewberryman/brunsviga (live at brunsviga.io )