AWS is a great fit for both steady state and episodic computational workloads. Here we present some common architecture patterns for analyzing genomic and other biomedical data on scalable high-throughput computational clusters on AWS. This talk will cover bootstrapping a traditional Beowulf compute cluster on AWS EC2, data transfer and storage strategies for S3.
3. AWS Rapid Pace of Innovation
2009
Amazon RDS
Amazon VPC
Auto Scaling
Elastic Load
Balancing
+48
2010
Amazon SNS
AWS Identity
& Access
Management
Amazon Route 53
+61
2011
Amazon
ElastiCache
Amazon SES
AWS
CloudFormation
AWS Direct
Connect
AWS Elastic
Beanstalk
GovCloud
+82
Amazon
CloudTrail
Amazon
CloudHSM
Amazon
WorkSpaces
Amazon Kinesis
Amazon Elastic
Transcoder
Amazon
AppStream
AWS OpsWorks
+280
2013
Amazon SWF
Amazon Redshift
Amazon Glacier
Amazon
Dynamo DB
Amazon
CloudSearch
AWS Storage
Gateway
AWS Data
Pipeline
+159
2012
Since inception AWS has:
• Released 1173 new services and features
• Introduced more than 40 major new services
• Announced 47 price reductions
2008
+24
Amazon EBS
Amazon
CloudFront
+516
2014
Amazon Cognito
Amazon Zocalo
Amazon Mobile
Analytics
AWS Directory
Service
Amazon RDS for Aurora
AWS CodeDeploy
AWS Lambda
AWS Config
AWS Key Management
Service
Amazon EC2
Container Service
*as of Jan 28, 2015
2007
Amazon FPS
+1
2006
+2
Amazon S3
Amazon SQS
4. Gartner “Magic Quadrant for Cloud Infrastructure as a Service,” Lydia Leong, Douglas Toombs, Bob Gill, Gregor Petri, Tiny Haynes, May 28, 2014. This Magic Quadrant graphic was published by Gartner,
Inc. as part of a larger research note and should be evaluated in the context of the entire report. The Gartner report is available at http://aws.amazon.com/resources/analyst-reports/. Gartner does not
endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications
consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including
any warranties of merchantability or fitness for a particular purpose.
5. Enterprise
Applications
Virtual
Desktops
Collaboration and Sharing
Platform
Services
Database
s
Caching
Relational
No SQL
Analytics
Hadoop
Real-time
Data
Workflows
Data
Warehouse
App Services
Queuing
Orchestration
App Streaming
Transcoding
Email
Search
Deployment & Management
Containers
Dev/ops Tools
Resource Templates
Usage Tracking
Monitoring and Logs
Mobile Services
Identity
Sync
Mobile Analytics
Notifications
Foundation
Services
Compute
(VMs, Auto-scaling
and Load Balancing)
Storage
(Object, Block
and Archive)
Security &
Access Control
Networking
Infrastructure Regions CDN and Points of PresenceAvailability Zones
6. Over 1 million active customers across
190 countries
800+ government agencies
3,000+ educational institutions
11 regions
28 availability zones
52 edge locations
Everyday, AWS adds enough new server capacity to support
Amazon.com when it was a $7 billion global enterprise.
8. Enterprise
Applications
Virtual
Desktops
Collaboration and Sharing
Platform
Services
Database
s
Caching
Relational
No SQL
Analytics
Hadoop
Real-time
Data
Workflows
Data
Warehouse
App Services
Queuing
Orchestration
App Streaming
Transcoding
Email
Search
Deployment & Management
Containers
Dev/ops Tools
Resource Templates
Usage Tracking
Monitoring and Logs
Mobile Services
Identity
Sync
Mobile Analytics
Notifications
Foundation
Services
Compute
(VMs, Auto-scaling
and Load Balancing)
Storage
(Object, Block
and Archive)
Security &
Access Control
Networking
Infrastructure Regions CDN and Points of PresenceAvailability Zones
9. • Resizable compute capacity in >25 instance types
• Reduces the time required to obtain and boot new server
instances to minutes or seconds
• Scale capacity as your computing requirements change
• Pay only for capacity that you actually use
• Choose Linux or Windows
• Deploy across Regions and Availability Zones for reliability
• Support for virtual network interfaces that can be attached to
EC2 instances in your VPC
16. Cluster instances deployed in a ‘Placement
Group’ enjoy low latency, full bisection 10
Gbps bandwidth
10Gbps
17. Enterprise
Applications
Virtual
Desktops
Collaboration and Sharing
Platform
Services
Database
s
Caching
Relational
No SQL
Analytics
Hadoop
Real-time
Data
Workflows
Data
Warehouse
App Services
Queuing
Orchestration
App Streaming
Transcoding
Email
Search
Deployment & Management
Containers
Dev/ops Tools
Resource Templates
Usage Tracking
Monitoring and Logs
Mobile Services
Identity
Sync
Mobile Analytics
Notifications
Foundation
Services
Compute
(VMs, Auto-scaling
and Load Balancing)
Storage
(Object, Block
and Archive)
Security &
Access Control
Networking
Infrastructure Regions CDN and Points of PresenceAvailability Zones
25. M
AWS region
AZ AVPC 10.0.0.0/16
SN 10.0.1.0/24 (DMZ)
SN 10.0.2.0/24 (Private)
NAT
InternetGWService
E
E
E
26. M
AWS region
AZ AVPC 10.0.0.0/16
SN 10.0.1.0/24 (DMZ) SN 10.0.2.0/24 (Private)
NAT
InternetGWService
E
E
E
S S S
27.
28.
29. cfncluster - provision an HPC cluster in minutes
cfncluster is a framework that deploys and maintains High
Performance Clusters (HPC) on AWS. It is reasonably
agnostic to what the cluster is for and can easily be extended to
support different frameworks. The CLI is stateless, everything is
done using CloudFormation or resources within AWS.
https://github.com/awslabs/cfncluster
31. Infrastructure as code
#cfncluster
The creation process might take a few minutes (maybe up
to 5 mins or so, depending on how you configured it.
Because the API to Cloud Formation (the service that
does all the orchestration) is asynchronous, we can kill
the terminal session if we wanted to and watch the whole
show from the AWS console (where you’ll find it all under
the “Cloud Formation”dashboard in the events tab for this
stack.
$ cfnCluster create boof-cluster
Starting: boof-cluster
Status: cfncluster-boof-cluster - CREATE_COMPLETE Output:"MasterPrivateIP"="10.0.0.17"
Output:"MasterPublicIP"="54.66.174.113"
Output:"GangliaPrivateURL"="http://10.0.0.17/ganglia/"
Output:"GangliaPublicURL"="http://54.66.174.113/ganglia/"
32. Yes, it’s a real HPC cluster
#cfncluster
arthur ~ [26] $ cfnCluster create boof-cluster
Starting: boof-cluster
Status: cfncluster-boof-cluster - CREATE_COMPLETE
Output:"MasterPrivateIP"="10.0.0.17"
Output:"MasterPublicIP"="54.66.174.113"
Output:"GangliaPrivateURL"="http://10.0.0.17/ganglia/"
Output:"GangliaPublicURL"="http://54.66.174.113/ganglia/"
arthur ~ [27] $ ssh ec2-user@54.66.174.113
The authenticity of host '54.66.174.113 (54.66.174.113)' can't be established.
RSA key fingerprint is 45:3e:17:76:1d:01:13:d8:d4:40:1a:74:91:77:73:31.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '54.66.174.113' (RSA) to the list of known hosts.
[ec2-user@ip-10-0-0-17 ~]$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvda1 10185764 7022736 2639040 73% /
tmpfs 509312 0 509312 0% /dev/shm
/dev/xvdf 20961280 32928 20928352 1% /shared
[ec2-user@ip-10-0-0-17 ~]$ qhost
HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO
SWAPUS
-------------------------------------------------------------------------------------------
---
global - - - - - - - - -
-
ip-10-0-0-136 lx-amd64 8 1 4 8 - 14.6G - 1024.0M
-
ip-10-0-0-154 lx-amd64 8 1 4 8 - 14.6G - 1024.0M
-
[ec2-user@ip-10-0-0-17 ~]$ qstat
[ec2-user@ip-10-0-0-17 ~]$
[ec2-user@ip-10-0-0-17 ~]$ ed hw.qsub
hw.qsub: No such file or directory
a
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -pe mpi 2
#$ -S /bin/bash
#
module load openmpi-x86_64
mpirun -np 2 hostname
.
w
110
q
[ec2-user@ip-10-0-0-17 ~]$ ll
total 4
-rw-rw-r-- 1 ec2-user ec2-user 110 Feb 1 05:57 hw.qsub
[ec2-user@ip-10-0-0-17 ~]$ qsub hw.qsub
Your job 1 ("hw.qsub") has been submitted
[ec2-user@ip-10-0-0-17 ~]$
[ec2-user@ip-10-0-0-17 ~]$ qstat
job-ID prior name user state submit/start at
slots ja-task-ID
------------------------------------------------------------------
---------------------
1 0.55500 hw.qsub ec2-user r 02/01/2015 05:57:25
10-0-0-44.ap-southeas 2
[ec2-user@ip-10-0-0-17 ~]$ qstat
[ec2-user@ip-10-0-0-17 ~]$ ls -l
total 8
-rw-rw-r-- 1 ec2-user ec2-user 110 Feb 1 05:57 hw.qsub
-rw-r--r-- 1 ec2-user ec2-user 26 Feb 1 05:57 hw.qsub.o1
[ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub.o1
ip-10-0-0-136
ip-10-0-0-154
[ec2-user@ip-10-0-0-17 ~]$
Now you have a cluster, probably running CentOS 6.x, with Sun Grid
Engine as a default scheduler, and openMPI and a bunch of other stuff
installed. You also have a shared filesystem in /shared and an
autoscaling group ready to expand the number of compute nodes in the
cluster when the existing ones get busy.
You can customize quite a lot via the .cfncluster/config file - check out the
comments.
33.
34. Foundation Services
Compute Storage Database Networking
AWS Global
Infrastructure Regions
Availability Zones
Edge Locations
Client-side Data Encryption &
Data Integrity Authentication
Server-side Encryption
(File System and/or Data)
Network Traffic Protection
(Encryption/Integrity/Identity)
Platform, Applications, Identity & Access Management
Operating System, Network & Firewall Configuration
Customer Data
AmazonCustomer
• SOC 1/SSAE 16/ISAE 3402
• SOC 2
• ISO 27001/ 2 Certification
• Payment Card Industry (PCI)
• Data Security Standard (DSS)
• NIST Compliant Controls
• DoD Compliant Controls
• FedRAMP
• HIPAA and ITAR Compliant
• Customers implement their
own set of controls
• Multiple customers with
FISMA Low and Moderate
ATOs
35. Facilities
Physical security
Compute infrastructure
Storage infrastructure
Network infrastructure
Virtualization layer (EC2)
Hardened service endpoints
Rich IAM capabilities
Network configuration
Security groups
OS firewalls
Operating systems
Applications
Proper service configuration
Auth & acct management
Authorization policies
+ =
Customer/Partner
• Re-focus your security professionals on a subset of the problem
• Take advantage of high levels of uniformity and automation
First global public cloud provider to achieve certification for security & quality management system
36.
37. Enterprise
Applications
Virtual
Desktops
Collaboration and Sharing
Platform
Services
Database
s
Caching
Relational
No SQL
Analytics
Hadoop
Real-time
Data
Workflows
Data
Warehouse
App Services
Queuing
Orchestration
App Streaming
Transcoding
Email
Search
Deployment & Management
Containers
Dev/ops Tools
Resource Templates
Usage Tracking
Monitoring and Logs
Mobile Services
Identity
Sync
Mobile Analytics
Notifications
Foundation
Services
Compute
(VMs, Auto-scaling
and Load Balancing)
Storage
(Object, Block
and Archive)
Security &
Access Control
Networking
Infrastructure Regions CDN and Points of PresenceAvailability Zones
38. EC2
10.0.2.12
AWS region – VPC network isolation
AZ A AZ B
VPC 10.0.0.0/16
SN 10.0.1.0/24 (DMZ) SN 10.0.2.0/24 (Private)
(23.20.103.11)
Internet
EC2
10.0.1.11
Internet GW Service
Virtual Gateway
39. Enterprise
Applications
Virtual
Desktops
Collaboration and Sharing
Platform
Services
Database
s
Caching
Relational
No SQL
Analytics
Hadoop
Real-time
Data
Workflows
Data
Warehouse
App Services
Queuing
Orchestration
App Streaming
Transcoding
Email
Search
Deployment & Management
Containers
Dev/ops Tools
Resource Templates
Usage Tracking
Monitoring and Logs
Mobile Services
Identity
Sync
Mobile Analytics
Notifications
Foundation
Services
Compute
(VMs, Auto-scaling
and Load Balancing)
Storage
(Object, Block
and Archive)
Security &
Access Control
Networking
Infrastructure Regions CDN and Points of PresenceAvailability Zones
41. You get to choose who
can do what in your AWS
environment and from
where
Manage and operate
US EAST
A
VPC
Internet GW Service
Virtual Gateway
B
SM
(EIP)
(EIP)
AWS account
owner (master)
Network &
security
Researcher Operations EMR
42. Enterprise
Applications
Virtual
Desktops
Collaboration and Sharing
Platform
Services
Database
s
Caching
Relational
No SQL
Analytics
Hadoop
Real-time
Data
Workflows
Data
Warehouse
App Services
Queuing
Orchestration
App Streaming
Transcoding
Email
Search
Deployment & Management
Containers
Dev/ops Tools
Resource Templates
Usage Tracking
Monitoring and Logs
Mobile Services
Identity
Sync
Mobile Analytics
Notifications
Foundation
Services
Compute
(VMs, Auto-scaling
and Load Balancing)
Storage
(Object, Block
and Archive)
Security &
Access Control
Networking
Infrastructure Regions CDN and Points of PresenceAvailability Zones
43. Amazon EBS
Amazon S3
• HTTPS
• AES-256 server-side encryption
• AWS or customer managed keys
• Each object gets its own key
• End-to-end secure network traffic
• Whole volume encryption
• AWS or customer managed keys
• Encrypted incremental snapshots
• Minimal performance overhead (utilizes Intel AES-NI)
44. Enterprise
Applications
Virtual
Desktops
Collaboration and Sharing
Platform
Services
Database
s
Caching
Relational
No SQL
Analytics
Hadoop
Real-time
Data
Workflows
Data
Warehouse
App Services
Queuing
Orchestration
App Streaming
Transcoding
Email
Search
Deployment & Management
Containers
Dev/ops Tools
Resource Templates
Usage Tracking
Monitoring and Logs
Mobile Services
Identity
Sync
Mobile Analytics
Notifications
Foundation
Services
Compute
(VMs, Auto-scaling
and Load Balancing)
Storage
(Object, Block
and Archive)
Security &
Access Control
Networking
Infrastructure Regions CDN and Points of PresenceAvailability Zones
45. • Records API calls, no matter how those API calls
were made (console, SDK, CLI)
• Who did what and when and from what IP address
• Logs saved to Amazon S3
• Includes EC2, Amazon EBS, VPC, Amazon RDS,
IAM, AWS STS, and Amazon RedShift
• Be notified of log file delivery by using the Amazon
Simple Notification Service (SNS)
• Aggregate log information across services into a
single S3 bucket
• Out of the box integration with log analysis tools
from AWS partners including Splunk, AlertLogic,
and SumoLogic
49. AWS HIPAA Program
Aligning services and workloads to the HIPAASecurity Rule
Bill Shinn, AWS Principal Security Solutions Architect
50. AWS HIPAAProgram
Strong presence in healthcare and life
sciences from our roots
Business Associates & January, 2013
Omnibus Final Rule
Starting signing Business Associate
Agreements (BAA) in Q2 2013
Program is based on Shared Security
Responsibility Model
AWS HIPAA Program is aligned to
NIST 800-53 & FedRAMP
Authorizations
51. Alignment to HIPAASecurity Rule
HIPAA Security Rule
(45 CFR Part 160 and Subparts
A and C of Part 164)
NIST 800-66
An Introductory Resource Guide
for Implementing the Health
Insurance Portability and
Accountability Act (HIPAA)
Security Rule
NIST 800-53
Moderate baseline + FedRAMP
Controls
52. AWS HIPAAEligible Services
Customer may use all services within a “HIPAA Account”
Customers may process, store, or transmit ePHI using only Eligible
Services
Amazon EC2
Elastic Load
Balancing
(TCP mode only)
Amazon S3Amazon EBS Amazon Glacier Amazon Redshift
53. AWS HIPAAconfiguration requirements
Customers must encrypt ePHI in transit and at rest
Customers must use EC2 Dedicated Instances for instances processing,
storing, or transmitting ePHI
Customers must record and retain activity related to use of and access to
ePHI
54. Office of Civil RightsAudit Protocol & Shared Security
Responsibility
Section
Established
Performance
Criteria Key Activity
Customer
Responsibility
AWS
Responsibility
AWS
Certification
Reference Additional Guidance
¤164.312(b):
Audit controls-
Implement
hardware,
software, and/or
procedural
mechanisms that
record and
examine activity
in information
systems that
contain or use
electronic
protected health
information.
Determine the
Activities that
Will be Tracked
or Audited
Inquire of management
as to whether audit
controls have been
implemented over
information systems
that contain or use
ePHI.
Obtain and review
documentation relative
to the specified criteria
to determine whether
audit controls have
been implemented
over … Yes Yes
NIST 800-53
AU-1, AU-2, AU-
3,
AU-4, AU-6, AU-
7
Customers processing, storing
or transmitting ePHI in AWS
must utilize a level of audit
logging sufficient to record all
activity related to use of and
access to protected health
information.
When using services such as
Amazon S3 or Amazon
Redshift, customers should
evaluate native logging
features such as Amazon S3
bucket logging to determine
how these features may assist
in meeting the implementation
specification.
(example – 45 CFR 164.312(b)
55. AWS HIPAAWeb Tier ReferenceArchitecture
VPC Public Subnet 10.40.1.0/24 VPC Public Subnet 10.40.2.0/24
AZ A AZ B
Public ELB in
TCP mode w/ Proxy Protocol
HAProxy tier – if needed, session state managed
via client-side cookie inserted by HAProxy.
SSL termination/re-encryption. Keys stored in
Amazon S3, retrieved by AWS CloudFormation at
system launch using entitlements of IAM role for
Amazon EC2.
Support for Proxy Protocol & x-forwarded-for
HAProxy/
Public
SSL
HAProxy/
Public
SSL
HAProxy/
Public
SSL
HAProxy/
Public
SSL
Web
Server/
Private
SSL
Web
Server/
Private
SSL
Web
Server/
Private
SSL
Web
Server/
Private
SSL
VPC Private Subnet 10.40.3.0/24 VPC Private Subnet 10.40.4.0/24
HAProxy tier performs backend encryption
between HAProxy nodes and Web nodes.
Keys stored in Amazon S3, retrieved by AWS
CloudFormation at system launch using
entitlements of IAM role for Amazon EC2.
SG: WebSecurityGroup
SG: ELBSecurityGroup
SG: HAProxySecurityGroup
56.
57. Unix/Linux instances start at
$0.02/hour
Pay as you go for compute power
Low cost and flexibility
Pay only for what you use, no up-front
commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
On-demand instances
1 or 3 year terms
Pay low up-front fee, receive significant
hourly discount
Low Cost / Predictability
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with steady state or
predictable usage
Applications that require reserved capacity,
including disaster recovery
Reserved instances Pay for use one time
No hourly fee
Reduce costs 47%-65%
All Upfront
Partial Payment
Lower Hourly Rate
Reduce Costs 45%-63%
Partial Upfront
Lower Hourly Rate
Reduce costs ~30%
No Upfront
58.
59. Unix/Linux instances start at
$0.02/hour
Pay as you go for compute power
Low cost and flexibility
Pay only for what you use, no up-front
commitments or long-term contracts
Use Cases:
Applications with short term, spiky, or
unpredictable workloads;
Application development or testing
On-demand instances
1 or 3 year terms
Pay low up-front fee, receive significant
hourly discount
Low Cost / Predictability
Helps ensure compute capacity is available
when needed
Use Cases:
Applications with steady state or
predictable usage
Applications that require reserved capacity,
including disaster recovery
Reserved instances
Bid on unused Amazon EC2
capacity
Spot Price based on supply/demand,
determined automatically
Cost / Large Scale, dynamic workload
handling
Use Cases:
Applications with flexible start and end
times
Applications only feasible at very low
compute prices
Spot instances
62. Leverage Spot instances in workflows
1 days worth of effort
resulted in
50% savings in cost
Harvard Medical School
The Laboratory of Personal Medicine
Run EC2 clusters to analyze entire
genomes“The AWS solution is stable, robust, flexible, and low cost. It
has everything to recommend it.”
Dr. Peter Tonellato, LPM, Center for Biomedical Informatics, Harvard Medical School
http://aws.amazon.com/solutions/case-studies/harvard/
63. http://bit.ly/aws-dbgap
Architecting for Genomic Data Security and
Compliance in AWS
Creating Healthcare Data Applications to Promote
HIPAA and HITECH Compliance
http://bit.ly/aws-hipaa
http://bit.ly/aws-hipaa-faq
64.
65. S3 Amazon EMRVery high,
non-blocking,
parallel bandwidth
2. Start a cluster
(Hadoop, SGE,
custom)1. Put data in S3
3. Get the results
67. Computational compound analysis
Solar panel material
Estimated serial computation time 264 years
156,314 core cluster across 8 regions
1.21 petaFLOPS (Rpeak)
Simulated 205,000 materials
18 hours for $33,000 16¢ per molecule
Notas do Editor
Note: This slides lists services that were launched in a given year. It is for illustrative purposes and may not be a complete list.
TALKING POINTS
AWS has been located in the Leader’s quadrant every year since Gartner began the Cloud IaaS MQ four years ago.
Gartner stated that AWS has more than five times the compute capacity in use than the aggregate total of the other fourteen providers in this Magic Quadrant.
Gartner identified AWS as the provider most commonly selected for strategic adoption.
Gartner recommends clients use AWS for all evaluated use cases, including enterprise applications, cloud-native applications, batch computing, e-business hosting, general business applications, and test and development.
Notably, AWS is the only “Leader” recommended for enterprise applications.
http://aws.amazon.com/ec2/instance-types/
GP: Small and mid-size databases, data processing, encoding, caching, SAP, Microsoft SharePoint and other enterprise application.
Compute Optimized: High-traffic web applications, ad serving, batch processing, video encoding, distributed analytics, high-energy physics, genome analysis, and computational fluid dynamics.
Memory Optimized: High performance databases, distributed memory caches, in-memory analytics, genome assembly and analysis, and larger deployments of SAP, Microsoft SharePoint and other enterprise applications
GPU: G2 popular use cases: Game streaming, 3D application streaming, and other server-side graphics workloads.
CG1 popular use cases: Computational chemistry, rendering, financial modeling, and engineering design.
Storage Optimized: I2 and HI1 popular use cases: NoSQL databases like Cassandra and MongoDB, and scale out transactional databases.
HS1 popular use cases: Data warehousing, Hadoop, and cluster file systems.
Micro: T1 popular use cases: Low traffic websites or blogs, small administrative applications, bastion hosts, and free trials to explore EC2 functionality
I2.8XL == 6.2TB SSD Storage
HS1.8XL == 48TB
CG1 released in November 2010
Leverage a large ecosystem of tools
There’s a shared responsibility to accomplish security and compliance objectives in AWS cloud. There are some elements that AWS takes responsibility for, and others that the customer must address. The outcome of the collaborative approach is positive results seen by customers around the world.
Include MFA in here.
Enterprises segregate important duties to reduce risk of accidental or malicious changes
AWS allows fine-grained segregation across virtually all aspects of the service
For example, you can segregate
Who can change network configuration
Who can change firewalls
Who can change how the VPC connects to the Internet or back to your corporate premises
Who can start and stop servers
Who can snapshot and restore storage volumes
AWS IAM offers a programatic level of control and granularity that would not be possible to implement in traditional on-premise environments
Need a better architecture diagram graphic on the right.
Intel® AES New Instructions (AES-NI): Intel AES-NI encryption instruction set improves upon the original Advanced Encryption Standard (AES) algorithm to provide faster data protection and greater security.
Refer back to DNAnexus implementation of encryption for S3 for data, EBS for metadata.
CloudTrail can help you achieve many tasks
Security analysis
Track changes to AWS resources, for example IAM, VPC security groups and NACLs
Compliance – understand AWS API call history
Troubleshoot operational issues – quickly identify the most recent changes to your environment
Take home message:
Be flexible with the type of instance you can run on.
Be flexible on where you can run your analyses.
S3 as a region service provides data access across AZ’s
Cohorts for Heart and Aging Research in Genomic Epidemiology project (CHARGE)
200 hundred researchers across 5 intitutions
Working to identify genes that contribute to aging and heart disease
DNA sequence of 14,000 individuals -- 3,751 whole genomes and 10,771 whole exomes
2.4 million core-hours of computational time
generated 440 TB (terabytes) of results
Nearly a petabyte of total storage