O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

2019 BioIt World - Post cloud legacy edition

527 visualizações

Publicada em

In 2019, genomics is cool with the cloud. This deck provides actionable advice on what to do with legacy cloud infrastructure, and what's coming next.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

2019 BioIt World - Post cloud legacy edition

  1. 1. Cloud Transformation 2.0 Embracing the Multi-Cloud Future March 12, 2019: Bio-IT World West Chris Dwan (chris@dwan.org) https://dwan.org
  2. 2. Conclusions In 20123456789 , we’re all “cool with the cloud” Premature optimization is still terrible Make it work, make it fast, make it cheap Experimentation and engineering are very different practices Great policy makes great systems This continues to be an amazing time to be an infrastructure / data nerd in health care / life science
  3. 3. Geek Cred: My First Petabyte, 2008My first Petabyte: NASA, 2008
  4. 4. Geek Cred: My First Petabyte, 2008My first Petabyte: NASA, 2008
  5. 5. NIH circa 2008
  6. 6. The evolution of data transfer …
  7. 7. Genomic Data Production in ContextData Explosion I did research computing at Broad from 2014 - 2017
  8. 8. Geek Cred: My First Petabyte, 2008My first Exabyte: 2014 Note that this exabyte is empty. Broad’s data is nowhere near Exascale
  9. 9. Cloud Definitions Public cloud: AWS, Azure, GCS, plus a bunch of wannabes Private cloud: Cloud services on gear you own, which may be hosted at a nice data center somewhere Fog computing: On premises equipment used for cloud stuff. It’s fog because that’s a cloud that’s close to earth. Get it? Hybrid cloud: Bursting to a public cloud for extra capacity. Multi cloud: Azure for business, AWS for burst / scalability, Google for that one weird trick. Enterprise cloud: IT trying desperately to align with a cloud strategy by changing the labels on the Powerpoint. “On premises,” or “legacy,” carrot cake still has a place, even in homes with a cake-as-a-service strategy. Hype-o-meter Impact-o-meter
  10. 10. The Cloud Is a Big Place Global IaaS Providers
  11. 11. Comparing the Big Three Uncontested heavyweight champion in terms of scale maturity of services and adoption. Services based on the market. Default offerings may not be a good fit for odd-shaped research computing problems. Market dominance means little incentive to provide discounts or customization.
  12. 12. Comparing the Big Three Focused on value-add platforms. Enthusiastic partner and sponsor in areas of interest to $GOOG Potential conflicts of interest in areas of interest to $GOOG Like something out of Greek mythology, consumes ecosystem partners whole. Uncontested heavyweight champion in terms of scale maturity of services and adoption. Services based on the market. Default offerings may not be a good fit for odd-shaped research computing problems. Market dominance means little incentive to provide discounts or customization.
  13. 13. Comparing the Big Three Your CIO already has a regular meeting with the Microsoft enterprise sales rep. Microsoft is already a qualified vendor in your purchasing systems. Decades of experience with regulatory compliance and governance Already provides your identity, authorization, and (probably) office productivity. Strategic purchases in HPC / ML / AI Uncontested heavyweight champion in terms of scale maturity of services and adoption. Services based on the market. Default offerings may not be a good fit for odd-shaped research computing problems. Market dominance means little incentive to provide discounts or customization. Focused on value-add platforms. Enthusiastic partner and sponsor in areas of interest to $GOOG Potential conflicts of interest in areas of interest to $GOOG Like something out of Greek mythology, consumes ecosystem partners whole.
  14. 14. Your CIO already has a regular meeting with the Microsoft enterprise sales rep. Microsoft is already a qualified vendor in your purchasing systems. Decades of experience with regulatory compliance and governance Already provides your identity, authorization, and (probably) office productivity. Strategic purchases in HPC / ML / AI Uncontested heavyweight champion in terms of scale maturity of services and adoption. Services based on the market. Default offerings may not be a good fit for odd-shaped research computing problems. Market dominance means little incentive to provide discounts or customization. Focused on value-add platforms. Enthusiastic partner and sponsor in areas of interest to $GOOG Potential conflicts of interest in areas of interest to $GOOG Like something out of Greek mythology, consumes ecosystem partners whole. Comparing the Big Three
  15. 15. Specific Advice on The Big Three Public cloud is an agility play, not a cost play. AWS, GCS, and Azure have very similar capabilities and pricing, even at scale. Pick one and get good at it. Don’t be afraid of running experiments. Avoid 2nd tier cloud providers unless there is an unambiguous business or capability reason to use them. Track spending, even when it’s “free.” $$ !!
  16. 16. The Cloud Is a Big Place Global IaaS ProvidersDomain Specific PaaS
  17. 17. The Cloud Is a Big Place Global IaaS ProvidersDomain Specific PaaS Your CIO is not thinking of HPC or research computing when articulating their cloud strategy.
  18. 18. The Cloud Is a Big Place Global IaaS Providers Analytics Framework Domain Specific PaaS Analysis platforms deserve their own slide deck.
  19. 19. RestaurantDeliveryTake and BakeHomemade Metaphor: Pizza as a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages On-Premises (legacy!) Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS) Credit: Everybody on the Internet.
  20. 20. RestaurantDeliveryTake and BakeHomemade Metaphor: Pizza as a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages On-Premises (legacy!) Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS) Credit: Everybody on the Internet.
  21. 21. The Cloud Is a Big Place Broad Firecloud Data PlatformGlobal IaaS Providers Analytics Framework Domain Specific PaaS Data platforms are where it’s at right now.
  22. 22. One common thread: “Why Not Do Both?” UC Health System Data Warehouse • Shared data warehouse • AND local instances at hospitals NIH: • World class dedicated HPC / networks • AND negotiated discounts with public cloud providers GenePattern Networks: • Free autoscaling environment on AWS • AND support workstation / local HPC
  23. 23. The Policies you Need Appropriate usage Human readable: Expectations of privacy and standards of behavior. Data Classification Governance: Defines the major categories of data (corporate sensitive, clinical, …) and standards for handling of each. Written Information Security Policy (WISP) Technical: Defines how systems will be configured to protect sensitive data and operations. Vendor Qualification Business SOP to establish practices around vendor access and management. Real world policy impact: Because bicycle lanes are “traffic lanes,” the argument about snow plowing is simple, which saves lives.
  24. 24. Practical advice on Cloud Systems Make it work – Use dedicated instances (full price) until you’re sure the software works – Don’t overthink it: Increase RAM and local disk to overcome crashing – Tear down /rebuild the entire infrastructure from time to time, even in dev. – All systems (yes, even cloud systems) have limits. Stop whining and learn them. – Any time you increase throughput by an order of magnitude, your system will break. Then make it fast – Profiling tools are your friend, automation is not. – Benchmark on real data. Imputed and synthetic data just echo your own assumptions back to you. Then make it cheap – Now you get to turn on spot instances. – This is the first time I ever want to hear about Glacier or Infrequently Accessed tiers of data
  25. 25. Practice does not make perfect. Practice makes permanent. Attributed to Yo Yo Ma Engineering is different than experimentation Application Repo Production Infrastructure Repo Build Test • Development can rely on production • Production cannot rely on development • Reference datasets are a prod resource. • No manual intervention in either test or prod.
  26. 26. Many Experiments, Few Projects INBOX Active INBOX INBOX Feasibility Development Operations Active Active No ability to predict turnaround times.
  27. 27. Many Experiments, Few Projects INBOX Active INBOX INBOX Feasibility Development Operations Active Active “When there is too much to do, there is a strong tendency to engage in local reprioritization, meaning that each person in the process looks at the pile she is facing, determines which items are the most important, and then works on those tasks first local reprioritization creates variability. If a task happens to be prioritized by everyone, it gets done quickly. But, that means another task has been moved to the bottom of several “to do” lists and it might take weeks or months to get done.” No ability to predict turnaround times.
  28. 28. FAIR Data (within the enterprise) Findable • NoSQL database of metadata and checksums • It’s plenty for a good long time. Accessible • Federated identity management • Architecture of S3 buckets and production “roles” Interoperable • Data standards, ontologies, strong policy framework, including electronic consents for human subjects data Reusable • ”It’s much easier to go FAR than to go FAIR” Catered Lunch Sense of well-being and contentment arising from realistic expectations Data Lake Open Bar
  29. 29. Incredible opportunities here, and rapidly developing data silos The Clinical Data Ecosystem There is an incredible wealth of data available to support both clinical care and research Unfortunately, it is carved up and isolated in technical and social silos. There are both good and bad reasons for this segmentation, and it is holding us back. Patient Journals Consumer products Longitudinal Data from other providers … Electronic Medical Records Possibility of a self-normal (N of 1) over time Diagnostic Imaging Natural language processing has strong potentialClinical Notes Innovations in the basics of clinical observation Hospital Telemetry Pressure to avoid incidental findings prevent bias Primary Lab Data
  30. 30. A Personal Story I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  31. 31. A Personal Story I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  32. 32. A Personal Story I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  33. 33. A Personal Story I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  34. 34. A Personal Story I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  35. 35. Conclusions In 2012345678 2019 , we’re all “cool with the cloud” Premature optimization is still terrible Make it work, make it fast, make it cheap Strong distinction between experimentation and engineering Great policy makes great platforms This continues to be an amazing time to be an infrastructure / data nerd in health care / life science
  36. 36. The future is already here – it’s just not very well distributed William Gibson

×