8. The future is already here – it’s just not very well distributed
William Gibson
9. My Cloud Journey
1998: Army Research Lab
– Java framework to distribute a target recognition workflow across multiple
DoD research sites
2002: Minnesota Center for Computational Biology and Genomics
– Campus wide “grid” unifying three compute clusters to run BLAST analyses for
crop genomics
2008: BioTeam
“Inquiry” HPC product ported to AWS
My first real “Infrastructure as code” moment
2012: New York Genome Center
Work to make a new genome center “cloud ready” (though limited initial
adoption)
2014 – 2017: Broad Institute of MIT and Harvard
Transition production genomics workflows to Google’s cloud
10. Geek Cred: My First Petabyte,
2008
Geek Cred: My first Petabyte: 2008
11. Geek Cred: My First Petabyte,
2008
My first Petabyte: 2008
14. Genomic Data Production in ContextGenomic data production @ Broad
I joined the Broad in
2014
Caveat: This plot looked
very similarly scary back
in 2007
15. Geek Cred: My First Petabyte,
2008
My first Exabyte: 2014
16. Genomes on the Cloud (April 2016)
Testing the genome
analysis pipeline
“Go-live”
21. Senior leadership and “cloud”
Removes a major support
burden from in-house staff
Automatic technology
updates rather than
annual fire-drills
22. Senior leadership and “cloud”
Removes a major support
burden from in-house staff
Vastly simplified
licensing and budget
planning
Automatic technology
updates rather than
annual fire-drills
23. Unlimited Scale, no
more forklift upgrades
Senior leadership and “cloud”
Removes a major support
burden from in-house staff
Vastly simplified
licensing and budget
planning
Automatic technology
updates rather than
annual fire-drills
24. Products are familiar to the
end-user rather than opaque
technology
Unlimited Scale, no
more forklift upgrades
Senior leadership and “cloud”
Removes a major support
burden from in-house staff
Vastly simplified
licensing and budget
planning
Automatic technology
updates rather than
annual fire-drills
25. Products are familiar to the
end-user rather than opaque
technology
Unlimited Scale, no
more forklift upgrades
Senior leadership and “cloud”
Removes a major support
burden from in-house staff
Vastly simplified
licensing and budget
planning
Automatic technology
updates rather than
annual fire-drills
26. What is the cloud?
“Amazon Web Services is the cloud”*
Chris Dagdigian
Bio-IT World, November 2009
* He has revised this opinion in the last 8 years
27. What is the cloud?
“Cloud computing is a model for enabling ubiquitous, convenient, on-
demand network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications, and services)
that can be rapidly provisioned and released with minimal
management effort or service provider interaction.”
NIST Special Publication 800-145
28. Homemade
Pizza as a Service
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
You Manage Vendor Manages Credit: Everybody on the Internet.
29. Take and BakeHomemade
Pizza as a Service
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
You Manage Vendor Manages Credit: Everybody on the Internet.
30. DeliveryTake and BakeHomemade
Pizza as a Service
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
You Manage Vendor Manages Credit: Everybody on the Internet.
31. RestaurantDeliveryTake and BakeHomemade
Pizza as a Service
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
You Manage Vendor Manages Credit: Everybody on the Internet.
32. RestaurantDeliveryTake and BakeHomemade
Pizza as a Service
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
You Manage Vendor Manages
On-Premises
(legacy!)
Infrastructure as
a Service (IaaS)
Platform as a
Service (PaaS)
Software as a
Service (SaaS)
Credit: Everybody on the Internet.
33. Cloud based killer apps
• Team chat / messaging: Slack, Skype, Hipchat, …
• File Sharing: Onedrive, Dropbox, Box, Egnyte, Google
Drive, …
• Video conferencing: Zoom, Chime, Skype, Hangouts, …
• Office productivity: G-Suite, Office 365
• Databases: Both SQL and NoSQL
35. Maslow’s Hierarchy of Needs
Friendship, connectedness, belonging
Confidence, achievement
Creativity,
Purpose
Safety, physical and economic stability
Air, food, shelter, sleep
If you lack this
You don’t get
to engage here
36. Maslow’s Hierarchy of Needs
Friendship, connectedness, belonging
Confidence, achievement
Creativity,
Purpose
Safety, physical and economic stability
Air, food, shelter, sleep
Wireless Internet, Fully charged battery
If you lack this
You don’t get
to engage here
37. IT Hierarchy of Needs
Productivity and Security, Applications,
disaster preparedness
Automation and
compliance
“Thought
Partner”
Files, formats, naming conventions, access controls
Phones, Projectors, Internet, Email, Chat
Power, Building Access, Laptops, Wifi, Identity
If you lack this
You don’t get
to engage here
39. Office Colocated Data Center
Cloud Hosted Legacy Architecture
Active
Directory
Master
Sysadmin Team
Data
Center
Team
AWS US-East-2
Silos of Files
ServerFarm
40. Office Colocated Data Center
Cloud Hosted Legacy Architecture
Active
Directory
Master
Sysadmin Team
Data
Center
Team
AWS US-East-2
Silos of Files
ServerFarm
ALL NEW!
70% MORE
CLOUD!
41. Office Colocated Data Center
Cloud Hosted Legacy Architecture
Active
Directory
Master
Sysadmin Team
Data
Center
Team
AWS US-East-2
Silos of Posix Storage
ServerFarm
Removes a major support
burden from in-house staff
Vastly simplified
licensing and budget
planning
Automatic technology
updates rather than annual
fire-drills
ALL NEW!
70% MORE
CLOUD!
Merely virtualizing your
infrastructure provides none of the
executive level benefits of “cloud”
43. Elasticity
Compute:
– Wal-mart parking lot
– Spiky, unpredictable demand
– Elasticity in compute is capacity
– For variable compute needs and agility, cloud compute is a
slam-dunk.
Data:
– Grows without bound
– Elasticity in data is mobility and latency
– Egress charges and lock-in present a structural challenge
for cloud as a long term data storage strategy.
44. The right side of history
• Applications are containerized (Docker, Singularity)
• Data is accessed RESTfully (S3)
• Identity management is federated (Oauth2, …)
• Analytics are ubiquitous (HDFS / Spark)
• Public clouds (AWS, GCS, Azure) provide flexible commodity infrastructure and
surge capacity
• Data flow operations adopt serverless architectures (Lambda)
• Technologists are embedded in project teams (DevOps)
This is a multi year journey.
Start today.
45. The right side of history
• Applications are containerized (Docker, Singularity)
• Data is accessed RESTfully (S3)
• Identity management is federated (Oauth2, …)
• Analytics are ubiquitous (HDFS / Spark)
• Public clouds (AWS, GCS, Azure) provide flexible commodity infrastructure and
surge capacity
• Data flow operations adopt serverless architectures (Lambda)
• Technologists are embedded in project teams
(DevOps / staff rotations)
This is a multi year journey.
Start today.
46. The opposite of play is not work, it’s depression
Jane McGonnigal, Reality is Broken
48. Financial Controls
• Shifting from CapEx to OpEx can put spending
power in the hands of individual contributors,
with little to no oversight.
• Cloud providers have robust tools for setting
and tracking budgets, but you must use them.
49. Data Deletion @ Scale
Me: “Blah Blah … I think we’re cool to delete about 600TB of data from a cloud
bucket. What do you think?”
50. Data Deletion @ Scale
Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket
Ray: “BOOM!”
51. Data Deletion @ Scale
Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket
• This was my first deliberate data deletion at this scale.
• It scared me how fast / easy it was.
• Look for single accounts / roles that can destroy everything.
53. Compliance and Security
Compliance:
– Things have changed a lot since 2014.
– All major cloud providers will now sign BAA and share
liability
– All major cloud providers can now support HIPAA,
HITECH, FISMA, and other audit standards
Security:
– Cloud based systems can be substantially more secure
than on premise.
– Can also be substantially less secure.
54. Premature optimization is the root of all evil (or at least
most of it)
Donald Knuth – Computer Programming as an Art, 1975
55. Specific Recommendations
Do not waste time on a IaaS vendor bake-offs.
– Choose one (GCS, AWS, Azure) based on in-house expertise and
enterprise relationships.
56. Specific Recommendations
Do not waste time on a IaaS vendor bake-offs.
– Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships.
Do not expect “cloud” to make things simpler or cheaper on day one.
– There will be substantial work to deploy any useful “as a service” product for your
particular process.
57. Specific Recommendations
Do not waste time on a IaaS vendor bake-offs.
– Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships.
Do not expect “cloud” to make things simpler or cheaper on day one.
– There will be substantial work to deploy any useful “as a service” product for your
particular process.
Hosted legacy doesn’t cut it.
– Achieving the benefit of cloud technologies will require you to re-
architect your legacy systems and re-tool your development /
deployment processes.
58. Specific Recommendations
Do not waste time on a IaaS vendor bake-offs.
– Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships.
Do not expect “cloud” to make things simpler or cheaper on day one.
– There will be substantial work to deploy any useful “as a service” product for your
particular process.
Hosted legacy doesn’t cut it.
– Achieving the benefit of cloud technologies will require you to re-architect your legacy
systems and re-tool your development / deployment processes.
Trust the lab, seriously.
– If they cling to Excel, means that Excel is better from their perspective.
– Ask them. They do not care about the cloud.
59. Specific Recommendations
Do not waste time on a IaaS vendor bake-offs.
– Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships.
Do not expect “cloud” to make things simpler or cheaper on day one.
– There will be substantial work to deploy any useful “as a service” product for your
particular process.
Hosted legacy doesn’t cut it.
– Achieving the benefit of cloud technologies will require you to re-architect your legacy
systems and re-tool your development / deployment processes.
Trust the lab, seriously.
– If they cling to Excel, means that Excel is better from their perspective.
– Ask them. They do not care about the cloud.
When in doubt, focus on the basics. Don’t overthink it.
60. If you have four groups working on a compiler, you’ll get a four pass
compiler
Eric S Raymond, The New Hacker’s Dictionary, 1996
62. Day One Commitments
Centralize Identity:
Integrate AD / Centrify / Okta.
Yes, the lab account too.
Roles, not Individuals
You will eventually have to clean it up
63. Day One Commitments
Centralize Identity:
Integrate AD / Centrify / Okta.
Yes, the lab account too.
Roles, not Individuals
You will eventually have to clean it up
Automate your archives
Unless it’s sequencing or imaging, dump it all to S3.
1TB on full fare S3 is $25/month. Don’t optimize yet.
64. Day One Commitments
Centralize Identity:
Integrate AD / Centrify / Okta.
Yes, the lab account too.
Roles, not Individuals
You will eventually have to clean it up
Automate your archives
Unless it’s sequencing or imaging, dump it all to S3.
1TB on full fare S3 is $25/month. Don’t optimize yet.
Capture Metadata
Scrape headers and whatever you can find into a simple database (NoSQL
is fine)
Include links to the S3 archive.
65. Day One Commitments
Centralize Identity:
Integrate AD / Centrify / Okta.
Yes, the lab account too.
Roles, not Individuals
You will eventually have to clean it up
Automate your archives
Unless it’s sequencing or imaging, dump it all to S3.
1TB on full fare S3 is $25/month. Don’t optimize yet.
Capture Metadata
Scrape headers and whatever you can find into a simple database (NoSQL is fine)
Include links to the S3 archive.
Curate:
Establish a regular meeting to review data architecture and cloud costs.
66. This stuff is important
We have an opportunity to change lives and health
outcomes, and to realize the gains of genomic medicine,
this year.
We also have an opportunity to waste vast amounts of
money and still not really help the world.
I would like to work together with you to build a better
future, sooner.
chris@dwan.org