This session demonstrates how the Cloud can accelerate breakthroughs in scientific research by providing on-demand access to powerful computing. The Session will feature scientific researchers making use of the Cloud to increase speed to results.
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Time to Science, Time to Results: Accelerating Research with AWS - AWS Symposium 2014 - Washington D.C.
1. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Accelerating Research with AWS
Steve Halliwell
shall@amazon.com
Jamie Kinney
jkinney@amazon.com
Angel Pizarro
pizarroa@amazon.com
2. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Why?
• “Work hard, have fun, make history”
• Accelerate the pace of scientific discovery
What?
• Motivations, Theory, and Practice
3. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Research Grants
• Apply for credits to teach advanced courses,
tackle research endeavors, and explore new
projects
• Bootstrap projects that previously would have
required expensive up-front and ongoing
investments in infrastructure
4. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
http://aws.amazon.com/solutions/case-studies/university-of-california-berkeley-amp-lab-carat-project/
5. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Some more examples
• MIT, Mark Pearrow, McGovern Institute
– Genetic and computational analysis,
electrophysiological recordings, and non-invasive
brain imaging
• University of Illinois Urbana-Champaign,
Indranil Gupta, Computer Science
– Research issues in loosely federated clouds
• Singapore Management University, Ming Jiang
– New techniques in malware analysis
• Technion, Israel Institute of Technology, Alex
Zlotnik
– Systems for efficient execution of scientific
workloads
• University of Maryland, Michael Schatz, Center
for Bioinformatics and Computational Biology
– Assembly of large genomes using cloud
computing
• ETH Zurich, Till Quack, Computer Vision Lab
– Large scale annotation of photo collections
• University of Pennsylvania, Zachary Ives,
Computer and Information Science Department
– Orchestra, collaborative data sharing system on
the cloud
• Monash University, Blair Bethwaite, eScience
and Grid Engineering Laboratory
– Mixing grids and clouds for high throughput
science
• Harvard University, Vinothan N. Manoharan,
SEAS, Department of Physics
– Exploring the physics of self-organization with
digital holographic microscopy
6. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Take home message
AWS Research Grants are a great way to
bootstrap a project, or experiment on AWS
http://aws.amazon.com/grants
7. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Scientific Computing
Initiatives
Y0L0!
8. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
UCSF,
UCSC, UCB
BGI
University of
Cape Town
UT/MD
Anderson
Seven
Bridges
Genomics
Caltech
Monash
Universit
y
Sanger
Institute
Wellcome
Trust
Fred
Hutchinson
Cancer
Research
Center & Sage
Bionetworkks
Broad
Institute
OIC
R
U. Chicago
Plus hundreds of other sites
around the world for
Co-Is and ColleaguesCancer
Researc
h UK
OHSU
RIKE
NIndian Society
of Human
Genetics
Global Alliance for Genomics &
Health
9. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
1+ Million Cancer Genome
Data Warehouse
10. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Enable collaboration
• Easily and securely share data and
applications across institutions
• Publish preconfigured resources
11. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Data to the
compute
12. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Compute to the
data
13. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Download and Copy
S3Amazon RDS
14. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Amazon
RDS
Access in the Cloud
S3
RDS
RDS
RDS
15. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Compute in
the Cloud
S3
Amazon
RDS
16. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Baylor College of MedicineA platform built by Baylor College of Medicine Human Genome
Sequencing Center and DNANexus using the Mercury Pipeline for the
Cohorts for Heart and Aging Research in Genomic Epidemiology
(CHARGE) Consortium
Supports 300+ researchers around the world
Analyzed the genomes of over 14,000 individuals, encompassing 3,751
whole genomes and 10,940 whole exomes (~1PB of data)
Used 3.3 million core hours over 4 weeks to complete the job 5.7x faster
than what could have been accomplished on-premise
The outcomes?
1. Easier collaboration
2. Faster time to science
3. Cost-effective: On-premise was prohibitively expensive
4. No longer constrained by on-premise capacity
5. Scientists focusing on Science as opposed to infrastructure
17. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• A centralized repository of public
datasets
• Seamless integration with cloud
based applications
• No charge to the community
• Tell us what else you’d like for us to
host …
AWS Public Data Sets
1000 Genomes Project
Ensembl, GenBank, UniGene, PubChem
NASA NEX: Earth science data sets
The Cannabis Sativa Genome
US Census Data: US demographic data from 1980,
1990, and 2000 US Censuses
Freebase Data Dump: A data dump of all the current
facts and assertions in the Freebase system, an open
database covering millions of topics
Google Books n-grams
18. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Technical computing: Why AWS?
The IT infrastructure needed for technical computing is:
Large, complex, expensive
Poorly utilized due to project cycles
Rapidly obsolete due to technology advances
Big simulations can require days or weeks per iteration
“Time in the queue” is a growing problem in larger firms
Result? Engineering innovation is slowed
19. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Big JOB to do …
20. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
… with little resources to do it.
21. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Use a large shared resource …
22. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
?
… but there is a queue.
23. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
The hidden cost of queues
• HPC users seek fastest possible time-to-results
and must compete for scarce cluster resources
• IT support team seeks highest possibility
utilization of expensive cluster resources
• Result:
• The job queue becomes the buffer for
managing IT capacity
• Time needed to complete simulations is too
long and hard to predict?
24. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Properly size your clusters …
25. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
… from small …
26. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
… to large …
27. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
… and lots of them!
28. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Computational compound analysis
Solar panel material
Estimated serial computation time 264 years
156,314 core cluster across 8 regions
1.21 petaFLOPS (Rpeak)
Simulated 205,000 materials
18 hours for $33,000 16¢ per molecule
http://news.cnet.com/8301-1001_3-57611919-92/supercomputing-simulation-employs-156000-amazon-processor-cores/
29. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Time: +00h
<10 cores
30. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Time: +24h
>1500 cores
31. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Time: +72h
<10 cores
32. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS value for HPC
• Security: Deploy applications and store data in a secure, highly
configurable VPC environment
• Agility: Deploy the right infrastructure for each technical computing job,
at the right time
• Scalability: Add and subtract servers in minutes to optimize time-to-
results
• Cost Savings: Pay only for what you use, don’t pay for idle or outdated
servers
33. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Experiment
often
Fail quickly,
at low cost
More
Innovation
34. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
HPC Partners and Apps
35. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Kyushu University
Support seasonal demand for engineering and science
computational resources.
36. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Downstream Analysis
Compute Analytics ToolsDatabasesStorage
37. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Questions
38. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
http://aws.amazon.com/solutions/case-studies/baylor/
39. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Elastic Map Reduce
S3 Amazon EMRVery high,
non-blocking,
parallel bandwidth
2. Start a cluster
(Hadoop, SGE, custom)1. Put data in S3
3. Get the results
40. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Easily scale to more
computational nodes
41. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Use Spot instances to
save $$$
43. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Amazon EC2
44. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Launch in VPC for
secure computing
Notas do Editor
Carat is a free app that tells you what is using up the battery of your mobile device
Personalized actionable recommendations to increase battery life
Over 700,000 users
Research the old way: Move data around
Difficult with today’s Big Data
Research the old way: Move data around
Difficult with today’s Big Data
Leverage a large ecosystem of tools
Cohorts for Heart and Aging Research in Genomic Epidemiology project (CHARGE)
200 hundred researchers across 5 intitutions
Working to identify genes that contribute to aging and heart disease
DNA sequence of 14,000 individuals -- 3,751 whole genomes and 10,771 whole exomes
2.4 million core-hours of computational time
generated 440 TB (terabytes) of results
Nearly a petabyte of total storage
Ever growing ecosystem of tools and HPC partners
Cohorts for Heart and Aging Research in Genomic Epidemiology project (CHARGE)
200 hundred researchers across 5 intitutions
Working to identify genes that contribute to aging and heart disease
DNA sequence of 14,000 individuals -- 3,751 whole genomes and 10,771 whole exomes
2.4 million core-hours of computational time
generated 440 TB (terabytes) of results
Nearly a petabyte of total storage