O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

2017 12 lab informatics summit

524 visualizações

Publicada em

The cloud journey, with a focus on genomics and drug discovery

Publicada em: Ciências
  • Seja o primeiro a comentar

2017 12 lab informatics summit

  1. 1. Leveraging the cloud to transform and streamline informatics processes Laboratory Informatics Summit December 5, 2017 Chris Dwan (chris@dwan.org)
  2. 2. Traveller there is no path. The path is made by walking. Antonio Machado
  3. 3. Conclusions Nobody cares about the cloud People care about business, scientific, and clinical outcomes “Cloud” is a means to an end. Nothing more.
  4. 4. Conclusions Nobody cares about the cloud People care about business, scientific, and clinical outcomes “Cloud” is a means to an end. Nothing more.
  5. 5. Conclusions Nobody cares about the cloud People care about business, scientific, and clinical outcomes “Cloud” is a means to an end. Nothing more.
  6. 6. Enterprise CIO Sr. Director, Research IT
  7. 7. The future is already here – it’s just not very well distributed William Gibson
  8. 8. My Cloud Journey 1998: Army Research Lab – Java framework to distribute a target recognition workflow across multiple DoD research sites 2002: Minnesota Center for Computational Biology and Genomics – Campus wide “grid” unifying three compute clusters to run BLAST analyses for crop genomics 2008: BioTeam “Inquiry” HPC product ported to AWS My first real “Infrastructure as code” moment 2012: New York Genome Center Work to make a new genome center “cloud ready” (though limited initial adoption) 2014 – 2017: Broad Institute of MIT and Harvard Transition production genomics workflows to Google’s cloud
  9. 9. Geek Cred: My First Petabyte, 2008 Geek Cred: My first Petabyte: 2008
  10. 10. Geek Cred: My First Petabyte, 2008 My first Petabyte: 2008
  11. 11. 2012: On-premise petabytes are no longer so interesting to me
  12. 12. Genomic Data Production in ContextGenomic data production @ Broad
  13. 13. Genomic Data Production in ContextGenomic data production @ Broad I joined the Broad in 2014 Caveat: This plot looked very similarly scary back in 2007
  14. 14. Geek Cred: My First Petabyte, 2008 My first Exabyte: 2014
  15. 15. Genomes on the Cloud (April 2016) Testing the genome analysis pipeline “Go-live”
  16. 16. 8 months in the cloud
  17. 17. 8 months in the cloud
  18. 18. “If you aim for simplicity, master complexity.” The Mustard Seed Garden Manual of Painting, 1679
  19. 19. Senior leadership and “cloud” Removes a major support burden from in-house staff
  20. 20. Senior leadership and “cloud” Removes a major support burden from in-house staff Automatic technology updates rather than annual fire-drills
  21. 21. Senior leadership and “cloud” Removes a major support burden from in-house staff Vastly simplified licensing and budget planning Automatic technology updates rather than annual fire-drills
  22. 22. Unlimited Scale, no more forklift upgrades Senior leadership and “cloud” Removes a major support burden from in-house staff Vastly simplified licensing and budget planning Automatic technology updates rather than annual fire-drills
  23. 23. Products are familiar to the end-user rather than opaque technology Unlimited Scale, no more forklift upgrades Senior leadership and “cloud” Removes a major support burden from in-house staff Vastly simplified licensing and budget planning Automatic technology updates rather than annual fire-drills
  24. 24. Products are familiar to the end-user rather than opaque technology Unlimited Scale, no more forklift upgrades Senior leadership and “cloud” Removes a major support burden from in-house staff Vastly simplified licensing and budget planning Automatic technology updates rather than annual fire-drills
  25. 25. What is the cloud? “Amazon Web Services is the cloud”* Chris Dagdigian Bio-IT World, November 2009 * He has revised this opinion in the last 8 years
  26. 26. What is the cloud? “Cloud computing is a model for enabling ubiquitous, convenient, on- demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” NIST Special Publication 800-145
  27. 27. Homemade Pizza as a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages Credit: Everybody on the Internet.
  28. 28. Take and BakeHomemade Pizza as a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages Credit: Everybody on the Internet.
  29. 29. DeliveryTake and BakeHomemade Pizza as a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages Credit: Everybody on the Internet.
  30. 30. RestaurantDeliveryTake and BakeHomemade Pizza as a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages Credit: Everybody on the Internet.
  31. 31. RestaurantDeliveryTake and BakeHomemade Pizza as a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages On-Premises (legacy!) Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS) Credit: Everybody on the Internet.
  32. 32. Cloud based killer apps • Team chat / messaging: Slack, Skype, Hipchat, … • File Sharing: Onedrive, Dropbox, Box, Egnyte, Google Drive, … • Video conferencing: Zoom, Chime, Skype, Hangouts, … • Office productivity: G-Suite, Office 365 • Databases: Both SQL and NoSQL
  33. 33. Maslow’s Hierarchy of Needs Friendship, connectedness, belonging Confidence, achievement Creativity, Purpose Safety, physical and economic stability Air, food, shelter, sleep
  34. 34. Maslow’s Hierarchy of Needs Friendship, connectedness, belonging Confidence, achievement Creativity, Purpose Safety, physical and economic stability Air, food, shelter, sleep If you lack this You don’t get to engage here
  35. 35. Maslow’s Hierarchy of Needs Friendship, connectedness, belonging Confidence, achievement Creativity, Purpose Safety, physical and economic stability Air, food, shelter, sleep Wireless Internet, Fully charged battery If you lack this You don’t get to engage here
  36. 36. IT Hierarchy of Needs Productivity and Security, Applications, disaster preparedness Automation and compliance “Thought Partner” Files, formats, naming conventions, access controls Phones, Projectors, Internet, Email, Chat Power, Building Access, Laptops, Wifi, Identity If you lack this You don’t get to engage here
  37. 37. Office Co-located Data Center Cloud Hosted Legacy Architecture Silos of Files ServerFarm Sysadmin Team Data Center Team
  38. 38. Office Colocated Data Center Cloud Hosted Legacy Architecture Active Directory Master Sysadmin Team Data Center Team AWS US-East-2 Silos of Files ServerFarm
  39. 39. Office Colocated Data Center Cloud Hosted Legacy Architecture Active Directory Master Sysadmin Team Data Center Team AWS US-East-2 Silos of Files ServerFarm ALL NEW! 70% MORE CLOUD!
  40. 40. Office Colocated Data Center Cloud Hosted Legacy Architecture Active Directory Master Sysadmin Team Data Center Team AWS US-East-2 Silos of Posix Storage ServerFarm Removes a major support burden from in-house staff Vastly simplified licensing and budget planning Automatic technology updates rather than annual fire-drills ALL NEW! 70% MORE CLOUD! Merely virtualizing your infrastructure provides none of the executive level benefits of “cloud”
  41. 41. What about the data?
  42. 42. Elasticity Compute: – Wal-mart parking lot – Spiky, unpredictable demand – Elasticity in compute is capacity – For variable compute needs and agility, cloud compute is a slam-dunk. Data: – Grows without bound – Elasticity in data is mobility and latency – Egress charges and lock-in present a structural challenge for cloud as a long term data storage strategy.
  43. 43. The right side of history • Applications are containerized (Docker, Singularity) • Data is accessed RESTfully (S3) • Identity management is federated (Oauth2, …) • Analytics are ubiquitous (HDFS / Spark) • Public clouds (AWS, GCS, Azure) provide flexible commodity infrastructure and surge capacity • Data flow operations adopt serverless architectures (Lambda) • Technologists are embedded in project teams (DevOps) This is a multi year journey. Start today.
  44. 44. The right side of history • Applications are containerized (Docker, Singularity) • Data is accessed RESTfully (S3) • Identity management is federated (Oauth2, …) • Analytics are ubiquitous (HDFS / Spark) • Public clouds (AWS, GCS, Azure) provide flexible commodity infrastructure and surge capacity • Data flow operations adopt serverless architectures (Lambda) • Technologists are embedded in project teams (DevOps / staff rotations) This is a multi year journey. Start today.
  45. 45. The opposite of play is not work, it’s depression Jane McGonnigal, Reality is Broken
  46. 46. Financial Governance $$ !!
  47. 47. Financial Controls • Shifting from CapEx to OpEx can put spending power in the hands of individual contributors, with little to no oversight. • Cloud providers have robust tools for setting and tracking budgets, but you must use them.
  48. 48. Data Deletion @ Scale Me: “Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket. What do you think?”
  49. 49. Data Deletion @ Scale Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket Ray: “BOOM!”
  50. 50. Data Deletion @ Scale Blah Blah … I think we’re cool to delete about 600TB of data from a cloud bucket • This was my first deliberate data deletion at this scale. • It scared me how fast / easy it was. • Look for single accounts / roles that can destroy everything.
  51. 51. Identity and Authorization
  52. 52. Compliance and Security Compliance: – Things have changed a lot since 2014. – All major cloud providers will now sign BAA and share liability – All major cloud providers can now support HIPAA, HITECH, FISMA, and other audit standards Security: – Cloud based systems can be substantially more secure than on premise. – Can also be substantially less secure.
  53. 53. Premature optimization is the root of all evil (or at least most of it) Donald Knuth – Computer Programming as an Art, 1975
  54. 54. Specific Recommendations Do not waste time on a IaaS vendor bake-offs. – Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships.
  55. 55. Specific Recommendations Do not waste time on a IaaS vendor bake-offs. – Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships. Do not expect “cloud” to make things simpler or cheaper on day one. – There will be substantial work to deploy any useful “as a service” product for your particular process.
  56. 56. Specific Recommendations Do not waste time on a IaaS vendor bake-offs. – Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships. Do not expect “cloud” to make things simpler or cheaper on day one. – There will be substantial work to deploy any useful “as a service” product for your particular process. Hosted legacy doesn’t cut it. – Achieving the benefit of cloud technologies will require you to re- architect your legacy systems and re-tool your development / deployment processes.
  57. 57. Specific Recommendations Do not waste time on a IaaS vendor bake-offs. – Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships. Do not expect “cloud” to make things simpler or cheaper on day one. – There will be substantial work to deploy any useful “as a service” product for your particular process. Hosted legacy doesn’t cut it. – Achieving the benefit of cloud technologies will require you to re-architect your legacy systems and re-tool your development / deployment processes. Trust the lab, seriously. – If they cling to Excel, means that Excel is better from their perspective. – Ask them. They do not care about the cloud.
  58. 58. Specific Recommendations Do not waste time on a IaaS vendor bake-offs. – Choose one (GCS, AWS, Azure) based on in-house expertise and enterprise relationships. Do not expect “cloud” to make things simpler or cheaper on day one. – There will be substantial work to deploy any useful “as a service” product for your particular process. Hosted legacy doesn’t cut it. – Achieving the benefit of cloud technologies will require you to re-architect your legacy systems and re-tool your development / deployment processes. Trust the lab, seriously. – If they cling to Excel, means that Excel is better from their perspective. – Ask them. They do not care about the cloud. When in doubt, focus on the basics. Don’t overthink it.
  59. 59. If you have four groups working on a compiler, you’ll get a four pass compiler Eric S Raymond, The New Hacker’s Dictionary, 1996
  60. 60. Day One Commitments Centralize Identity: Integrate AD / Centrify / Okta. Yes, the lab account too.
  61. 61. Day One Commitments Centralize Identity: Integrate AD / Centrify / Okta. Yes, the lab account too. Roles, not Individuals You will eventually have to clean it up
  62. 62. Day One Commitments Centralize Identity: Integrate AD / Centrify / Okta. Yes, the lab account too. Roles, not Individuals You will eventually have to clean it up Automate your archives Unless it’s sequencing or imaging, dump it all to S3. 1TB on full fare S3 is $25/month. Don’t optimize yet.
  63. 63. Day One Commitments Centralize Identity: Integrate AD / Centrify / Okta. Yes, the lab account too. Roles, not Individuals You will eventually have to clean it up Automate your archives Unless it’s sequencing or imaging, dump it all to S3. 1TB on full fare S3 is $25/month. Don’t optimize yet. Capture Metadata Scrape headers and whatever you can find into a simple database (NoSQL is fine) Include links to the S3 archive.
  64. 64. Day One Commitments Centralize Identity: Integrate AD / Centrify / Okta. Yes, the lab account too. Roles, not Individuals You will eventually have to clean it up Automate your archives Unless it’s sequencing or imaging, dump it all to S3. 1TB on full fare S3 is $25/month. Don’t optimize yet. Capture Metadata Scrape headers and whatever you can find into a simple database (NoSQL is fine) Include links to the S3 archive. Curate: Establish a regular meeting to review data architecture and cloud costs.
  65. 65. This stuff is important We have an opportunity to change lives and health outcomes, and to realize the gains of genomic medicine, this year. We also have an opportunity to waste vast amounts of money and still not really help the world. I would like to work together with you to build a better future, sooner. chris@dwan.org
  66. 66. Thank You chris@dwan.org https://dwan.org

×