"Running high-performance scientific and engineering applications is challenging no matter where you do it. Join IT executives from Hitachi Global Storage Technology, The Aerospace Corporation, Novartis, and Cycle Computing and learn how they have used the AWS cloud to deploy mission-critical HPC workloads.
Cycle Computing leads the session on how organizations of any scale can run HPC workloads on AWS. Hitachi Global Storage Technology discusses experiences using the cloud to create next-generation hard drives. The Aerospace Corporation provides perspectives on running MPI and other simulations, and offer insights into considerations like security while running rocket science on the cloud. Novartis Institutes for Biomedical Research talks about a scientific computing environment to do performance benchmark workloads and large HPC clusters, including a 30,000-core environment for research in the fight against cancer, using the Cancer Genome Atlas (TCGA)."
3. Goals for today
• See real world use cases from 3 leading engineering
and scientific computing users
– Steve Philpott, CIO, HGST, A Western Digital Company
– Bill E. Williams, Director, The Aerospace Corporation
– Michael Steeves, Sr. Systems Engineer, Novartis
• Understand the motivations, strategies, lessons learned
in running HPC / Big Data workloads in the cloud
• See the varying scales and application types that run
well, including a 1.21 PetaFLOPS environment
4. Agenda
•
•
•
•
•
•
Introduction
Steve Philpott – Journey into Cloud
Bill Williams – Cloud Computing @ Aerospace
Michael Steeves – Accelerating Science
Spot, On-demand, & Other Production uses
Questions and answers
6. Cloud & Datacenter
Performance Enterprise
Founded in 2003 through the combination of the hard drive
businesses of IBM, the inventor of the hard drive, and
HGST, Ltd
PCIe
Enterprise SSD
(+3 acquisitions)
SAS
10K & 15K
HDDs
Acquired by Western Digital in 2012
More than 4,200 active worldwide patents
Headquartered in San Jose, California
Approximately 41,000 employees worldwide
Develops innovative, advanced hard disk drives, enterprise-class
solid state drives, external storage solutions and services
Ultrastar®
Capacity Enterprise
7200 RPM &
CoolSpin
HDDs
Ultrastar® &
MegaScale DC™
Delivers intelligent storage devices that tightly integrate hardware
and software to maximize solution performance
6
7. Zero to Cloud in 6+ Month
By 31 Oct 2013:
Cloud eMail – Microsoft Office365
April 2013
Cloud eMail archiving/eDiscovery
External SingleSignOn (off VPN)
Cloud File/Collaboration – BOX
Cloud CRM – Salesforce.com
Integrated to save files in BOX
Cloud–High Performance Computing
(HPC) on Amazon AWS
Cloud – Big Data Platform on Amazon AWS
7
8. Responding to the Changing Business Model
Where is our business model headed?
“New Age of Innovation” as a guide
N=1 Focus on Individual Customer Experience
R=G Resources are Global
Implications
–Increase in strategic partnering
–Need for high level of flexibility
–Leveraging external expertise
Use of the Cloud/SaaS aligns with
Virtual Business Model:
Variable cost model critically important
Lightweight, scalable services
Reduced up-front capital spend
Accelerated provisioning
Pay as you go
8
9. Paradigm Shift: Consumerization of IT
“I have better technology at home”
Consumer Web
A new paradigm in ease of use and reduced cost.
Consumer web has been driven by a series of
platforms – and these platforms are household brand
names today
When we use these platforms, it continually amazes
us – how easy, how consistent these platforms work
A new set of services: DRM to iTunes
Yet, our workplace applications are cumbersome, costly,
difficult to navigate and require extensive support
Workday, 2009
9
10. The Big Switch – The Box has Disappeared
The Transformation of Computing as we Know it.
Physical to Virtual/Digital move
– Do you really care which computer processed your last
Google search?
Efficiency
– Do not waste a CPU cycle or a byte of memory.
Building a 4-story building and only using the 1st floor
Utility: IT as a Service - Plug it in and get it
– Where the electricity industry has gone, Computing is following
– Computing shift is almost invisible to the end-user
DATA is the value to the Organization, not the “where”
1
11. Enabling the Virtual Organization
Reframing IT Away From Thinking of “The App”
Business Intelligence and Analytics
End-to-End Business Processes
Enterprise Data Management
New Computing Platforms
Strategic
Outsourcing
Software as a Service
(SaaS)
New IT Organizational Structures:
Support and Align to “New Business Model”
1
1
12. Creating an Innovation Playground:
Where to Start and How to Evolve
IT Supports Business Strategy
Executive Buy-In – CEO, CIO, InfoSec, etc
Reduce Cap-ex, Optimize DC usage
Build
Expertise
Implement
Outcome Defined
Knowledge
Play
Learn
Educate
• Team Involvement
• Conferences
• Vendor Briefing
• Expert Services
• Best Practices
Experiment
• Team Approach
• Hands-on approach
• Understand the value
proposition
• Understand constraints
Migrate
• Migrate dev/test
environments
• Migrate or
launch new apps
on the cloud
Embrace success
Showcase cost
savings
Build an enterprise
cloud strategy
Learn from each
experience
Expand accordingly
• Indentify app fit for cloud
computing
• Define new processes
• Collaborate with
other companies
12
Awareness
Understanding
Transition
Commitment
12
13. Multiple Opportunities to Leverage Amazon Web Services (AWS)
AWS: “ >5x the compute capacity than next
14 providers combined” – Gartner, Aug 2013
Access to massive compute and storage
Billed by the hour - only pay for what is used
HGST Japan Research Lab: Using AWS for higher
performance, lower cost, faster deployed solution vs. buying
huge on-site cluster
Develop AWS Competency
Many Opportunities: In-house and commercial HPCs are “cloud ready”
Provide Computing When Needed: Reduce capital investment & risk and increase flexibility
Faster Response to Business Needs: Rapid prototyping to pilot new IT capabilities with “PO
Process” ; setup users, allocate compute and storage in minutes, load apps and go
AWS provide a great option for disaster recovery for our “on-premise” clusters and storage
13
14. HGST’s Amazon HPC Platform
Case 3: Lube depletion in TAR (2D heat profile)
1.E+07
(300,000 atoms)
Atoms Dealing with
Basic Molecular Simulation
Large Scale Molecular Simulation for HDI
Top view
1.E+06
(Lube molecules spreading onto COC)
Case 3
5 ns
Case 1
1.E+05
1 ns
5 ns
Case 2
1.E+04
Relaxation time: 5 ns
Relaxation time: < 1 ns
1.E+03
0
100
200
300
400
Number of Core
500
600
Heat spot in TAR
36 nm
Molecular
Dynamics
Simulation
Read / Write
Magnetics
Electo –
Magnetic Fields
Mechanical
MAGLAND
Simulation
Application
CST Read / Write
Magnetics
Electo –
Magnetic Fields
Base HPC Platform
Scalable to thousands of
instances to support numerous
simultaneous simulations
Ansys
Commercial
LLG
Ansys
HFSS
Pre- and Post-Processing
Server Farms
New G2 Instances Add
Visualization Capabilities
14
15. Big Data’s “3 V’s”
Three “V’s” of Big Data
Best pragmatic
Volume
Velocity
•Data sources
•Data types
•Applications
Trends
Variety
•Data collected
•Analysis & metadata creation
•Data acquisition
•Analysis & action
Structured
Terabytes
Batch
Unstructured, Semi-Structured
& Structured
Petabytes & Exabytes
Real-Time & Streaming
Implications &
Opportunities
• Hardware and software optimization
• Architectural shifts: Scale-out systems, Distributed filesystems,
Tiered storage, Hadoop…
Key difference: data structure does not need
to be defined before loading
definition from
Snijders et al.
“Data sets so large
and complex that
they become
awkward to work
with using standard
tools and
techniques”
15
16. Data Sources
Big Data Platform
All raw parametric,
logistic, vintage, data
Parallelized
batch analytics
raw
extracts
Batch Analytics
Enriched
data
Slider
Wafer
Media
Substrate
Optimize/Reduce
Testing
End-to-End
Integrated
Data
.
.
.
SAP/DW’s
App-Specific
Views
Failure Screen
Tests
Proactive Drift
Identification
Field Data
Supplier
Ad hoc Analysis
Customer FA via
Field Data
HDD
HGA
Consumers
New High-Value
Parameters
SAS, Compellon or
other Predictive
Analytic Tools
Tableau, and
other tools
New Unified
EDW
16
17. Characteristics of a “Typical” Hadoop / Big Data Cluster
Hadoop handles large data volumes and reliability in the software tier
− Hadoop distributes data across cluster; uses replication to ensure data reliability
and fault tolerance.
Each machine in Hadoop cluster stores AND processes data; machines must do both well.
Processing sent directly to the machines storing the data.
Hadoop MapReduce
Compute Bound Operations
and Workloads
•
•
•
•
Clustering/Classification
Complex text mining
Natural-language processing
Feature extraction
Hadoop MapReduce
I/O Bound Operations
and Workloads
• Indexing
• Grouping
• Data importing and exporting
• Data movement and
transform
Big Data Solutions Must Support a Large Variety of
Compute and I/O Operations and Storage Needs …enter “the Cloud”
17
18. AWS Big Data Platform Storage Services
Block Storage for Elastic Computing
Optimized for Performance
SSD / 15K / 10K
Amazon
EBS
Highly Virtualized / SAN-Based
“Generic” Object Storage
Bulk of AWS Storage Today
Amazon
S3
Virtualized or Reserved Use
Server/Network-Based
Cold/Cool Storage
Amazon
Glacier
Lowest Cost Model for “least”
used data
3-5 hour Latency / Sequentialized
18
19. HGST’s Other Amazon Use Cases/Capabilities
Petabyte-Scale Data
Warehousing
“Between Glacier & S3”
Run Data Visualization
tools in AWS
Resource Tracking Tool
Includes Tableau
instance for reporting
and visualization
More and
more users
coming to IT
asking for how
to leverage
this new
compute
capability
19
20. We Are Just Starting with the Cloud
• Current Results From 6 month Effort
• Re-aligning Business Group Leadership
• Demands and Use To Grow And Accelerate
Cloud + HGST IT =
Strong Innovation and Business Partner
20
22. Introduction and Background
• IT Executive for the The Aerospace Corporation
•
•
(Aerospace)
Manage HPC compute and cloud resources for
the Aerospace corporate
Career path has taken me through end user
support, system administration, and enterprise
architecture
23. Agenda
•
•
•
•
•
•
•
•
Who is Aerospace?
High Performance Computing @ Aerospace
Services Provided
Cloud Motivation
Where are we today?
What makes this work?
Challenges
Lessons Learned
25. High Performance Computing @ Aerospace
• Allow engineers and scientists to focus on their
•
•
discipline and research
Reduce and eliminate complexity in using High
Performance Computing (HPC) resources
Supply and support centralized and networked
HPC resources
27. Cloud Motivation
•
•
•
•
•
•
Respond to an increasing and variable demand
Improve resource deployments and use
Enhance provisioning
Improve security posture
Improve disaster recovery posture
Greener
28. Where are we today?
•
•
•
•
Successfully established elastic clusters in AWS
GovCloud
– Workload runs include Monte Carlo and Array Simulations
Key features of the GovCloud clusters are
auto-scaling and on-demand computing
Compute instances are created as needed to meet job
computational requirements
Making strides towards mimicking internal clusters in
GovCloud
29. What makes this work?
• AWS GovCloud
– GovCloud is FedRAMP compliant
• Secure transport to and from Aerospace
– VPC provides an additional layer of security while data is in transit
• Cyclecomputing
– Cycle provides cluster auto-scaling
30. Lessons Learned
• Enhanced analytics and business intelligence
• Customer success stories
• Standard images
• Demonstrated operational “agility”
31. Lessons Learned
•
Domain space is dynamic
•
Expertise required
•
Layers of complexity
•
Ensuring data security (in hybrid deployment model)
32. Challenges
•
•
•
•
•
Establishing a cloud storage infrastructure
Determining appropriate bandwidth between
Aerospace and GovCloud
Library replication of internal systems
System integration with internal authentication
services
Insuring a seamless transition to hybrid services
33. What’s Next?
•
•
•
•
•
•
Expand offerings
Explore charge back
Explore “cloudifying” other HPC platforms
Track technology
Provide workload specific ad-hoc offerings
Provide surge capability for HPC resources
35. Novartis Institutes for BioMedical Research (NIBR)
Unique research strategy driven by patient needs
World-class research organization with about
6000 scientists globally
Intensifying focus on molecular pathways shared by
various diseases
Integration of clinical insights with mechanistic
understanding of disease
Research-to-Development transition redefined
through fast and rigorous “proof-of-concept” trials
Strategic alliances with academia and biotech
strengthen preclinical pipeline
36. Accelerating the Science
Requirements
Large Scale Computational Chemistry Simulation
Results in under a week
Ability to run multiple experiments “on-demand”
Challenges
Sustained access to 50000+ compute cores
Ability to monitor and re-launch jobs
No additional Capital Expenditure
Internal HPCC already running at capacity
Job Profile
Embarrassingly Parallel
CPU Bound
Low I/O, Memory and Network requirements
Virtual Screening
Target
Molecule
Compound
Molecule
binding
site
"Lock"
"Keys"
37. The Cloud: Flexible Science on Flexible Infrastructure
Engineering the right infrastructure for a workload:
Software runs the same job many times across instance types
Measures the throughput and determines the $ per job
Use the instances that provide the best scientific ROI
CC2 instance (Intel Xeon® ‘Sandy Bridge’) ran best for this
38. Super Computing in the Cloud
Metric
Compute Hours of Science
341,700 hours
Compute Days of Science
14,238 days
Compute Years of Science
39 years
AWS Instance Count-CC2
Count
10,600 instances
$44 Million infrastructure
10 million compounds screened
39 Drug Design years in 11 hours for a cost of …$4,232
3 compounds identified and synthesized for screening
39. Key Learnings/What’s Next?
Diversity of Life Sciences brings unique challenges
Spend the time analyzing and tuning
Flexibility, Scalability and Performance
Time to rethink and retool
Challenge the Science and the Scientist
Collaboration
Future plans
Chemical Universe : 166 Billion cpds (Extreme scale CPU)
Next Generation Sequencing in the Cloud (Extreme CPU, Mem, I/O)
“Disruptive” Technologies-Imaging (10x that of NGS!)
40. Using On-Demand and
Spot Instances together
When task durations are > than 1
hour or require multiple machines
(MPI) for long periods, then use ondemand
Shorter workloads work great for
Spot Instances
If you want a guaranteed end time,
use on-demand as well, so the
architecture looks like…
41. User
Scale from 150 - 150,000+ cores
CycleCloud Deploys Secured, Auto-scaled HPC Clusters
HPC Cluster
Load-based Spot bidding
On-Demand Execute Nodes
(Guaranteed finish)
Check job load
Calculate ideal HPC cluster
Legacy
Internal
HPC
Shared FS
Spot Instance Execute Nodes
(auto-started & auto-stopped
calculation is faster/cheaper)
Properly price the bids
Manage Spot Instance loss
FS /
S3
HPC Orchestration to
Handle Spot Instance Bid & Loss
42. Other Production use cases
•
•
•
•
•
•
•
Sequencing, Genomics, Life Sciences
MPI workloads for FEA, CFD, energy, utilities
MATLAB and R applications for stats/modeling
Win HPC Server cluster for finance
Heat transfer and other FEA
Insurance risk management
Rendering/VFX
43. Designing Solar Materials
The Challenge is efficiency
Need to efficiently turn photons from the sun to Electricity
The number of possible materials is limitless:
• Need to separate the right compounds from the useless ones
• If the 20th century was the century of silicon, the 21st will be
all organic
How do we find the right material out of 205,000
without spending the entire 21st century looking for it?
EMBARGOED until Nov. 12, 2013 8 a.m. EST
51. Question and Answer
How does utility HPC apply to your organization?
Follow us: @cyclecomputing, @jasonastowe
Come to Cycle’s booth: #1112
We’re hiring jointheteam@cyclecomputing.com
52. Please give us your feedback on this
presentation
BDT212
As a thank you, we will select prize
winners daily for completed surveys!