O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Big Data and Data Science: Opportunities for Biomedical Engineering

468 visualizações

Publicada em

American Institute of Medical and Biological Engineeting (AIMBE) presentation to the Academic Council. Washington DC, April 8, 2018

Publicada em: Educação
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Big Data and Data Science: Opportunities for Biomedical Engineering

  1. 1. Big Data and Data Science: Opportunities for Biomedical Engineering Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 04/08/18 AIMBE Academic Council 1 @pebourne
  2. 2. Disclaimer • This is mostly NOT a talk about my own research • It draws upon my now one-year old view of NIH as the former Associate Director for Data Science (ADDS) • It suffers from my drinking my own Kool-aid at the University of Virginia 04/08/18 AIMBE Academic Council 2
  3. 3. Take home (hopefully) • Increased awareness of the value of data science to your activities • Increased awareness of where NIH is headed • Some thoughts about how to build out data science in your own institutions 04/08/18 AIMBE Academic Council 3
  4. 4. Big data and data science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… 04/08/18 AIMBE Academic Council 4 http://vadlo.com/cartoons.php?id=357
  5. 5. So what do I mean by big data/data science? • Use of the ever increasing amount of open, complex, diverse digital data • Finding ways to ask and then answer relevant questions by combining such diverse data sets • Arriving at statistically significant conclusions not otherwise obtainable • Sharing such findings in a useful way • Translating such findings into actions that improve the human condition 04/08/18 AIMBE Academic Council 5
  6. 6. Cause • There are ~2.7 Zetabytes (2.7 x 106 PB) of digital data • Volume is doubling every two years • Sheer volume of digital data e.g., $1000 genome, wearable sensors, mandatory EHRs • New tools e.g., Deep Artificial Neural Networks (DNNs) • New computing power e.g., GPUs 04/08/18 AIMBE Academic Council 6
  7. 7. Effect • Big data currently estimated as a $50bn business – could save $3.1tn • 50% growth in data/yr; 5% growth in IT expenditure • US 140,000- 190,000 unfilled deep data analytics jobs • UVA DSI has 600 applicants this year for 50 spots; MSDS/MBA highly sought AIMBE Academic Council 704/08/18
  8. 8. Effect ++ • Big data currently estimated as a $50bn business – could save $3.1tn – private sector research • 50% growth in data/yr; 5% growth in IT expenditure - undervalued • US 140,000- 190,000 unfilled deep data analytics jobs – competition for skilled researchers high • DSI has 600 applicants this year for 50 spots; MSDS/MBA highly sought – large human capital AIMBE Academic Council 804/08/18
  9. 9. How much biomedical data? • Big Data – Total data from NIH-funded research in 2016 estimated at 650 PB* – 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB in 2016 • Dark Data – Only 12% of data described in published papers is in recognized archives – 88% is dark data^ • Cost – 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives * In 2012 Library of Congress was 3 PB ^ http://www.ncbi.nlm.nih.gov/pubmed/26207759 04/08/18 AIMBE Academic Council 9
  10. 10. Consider some current high profile NIH examples where and how data science is being applied • Moonshot - platforms and integration, ML • MODs – automated curation • Human Microbiome Project – new cloud based tools, ML • TOPMed - platforms and integration • All-of-Us - platforms and integration • ECHO – platforms and integration • BRAIN - ML 10 All: Analytics, the Commons, FAIR, sustainability, workforce 04/08/18 AIMBE Academic Council
  11. 11. What of the future? One view is the 6D’s 04/08/18 AIMBE Academic Council 11
  12. 12. Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication From a presentation to the Advisory Board to the NIH Director Example - photography 1204/08/18 AIMBE Academic Council
  13. 13. A call for making these data open • Mandates – NIH, NSF, Data Management Plans • Business models can be protected yet everyone benefits • It saves lives …. 04/08/18 AIMBE Academic Council 13
  14. 14. Why a More Open Process? Use case: Diffuse Intrinsic Pontine Gliomas (DIPG) • Occur 1:100,000 individuals • Peak incidence 6-8 years of age • Median survival 9-12 months • Surgery is not an option • Chemotherapy ineffective and radiotherapy only transitive From Adam Resnick04/08/18 AIMBE Academic Council 14
  15. 15. Timeline of genomic studies in DIPG • Landmark studies identify histone mutations as recurrent driver mutations in DIPG ~2012 • Almost 3 years later, in largely the same datasets, but partially expanded, the same two groups and 2 others identify ACVR1 mutations as a secondary, co- occurring mutation From Adam Resnick 04/08/18 AIMBE Academic Council 15
  16. 16. What do we need to do differently to reveal ACVR1? • ACVR1 is a targetable kinase • Inhibition of ACVR1 inhibited tumor progression in vitro • ~300 DIPG patients a year • ~60 are predicted to have ACVR1 • If large scale data sets were only integrated with TCGA and/or rare disease data in 2012, ACVR1 mutations would have been identified • 60 patients/year X 3 years = 180 children’s lives (who likely succumbed to the disease during that time) could have been impacted if only data were FAIR From Adam Resnick 04/08/18 AIMBE Academic Council 16
  17. 17. How to promote departmental/institutional openness? • Encourage persistent identifiers e.g., ORCID • Encourage preprints • Encourage Open Access (OA) • Recognize openness in hiring and P&T • Teach open scholarship • Promote institutional openness – repositories, wikimedian in residence • Support institutional open data governance 04/08/18 AIMBE Academic Council 17
  18. 18. NIH Strategic Plan for Data • Support a Highly Efficient and Effective Biomedical Research Data Infrastructure • Promote Modernization of the Data-Resources Ecosystem • Support the Development and Dissemination of Advanced Data Management, Analytics, and Visualization Tools • Enhance Workforce Development for Biomedical Data Science • Enact Appropriate Policies to Promote Stewardship and Sustainability 04/08/18 AIMBE Academic Council 18 https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
  19. 19. Research Data Infrastructure … Both funders and some institutions see the need to move from pipes to platforms to accelerate research… 04/08/18 AIMBE Academic Council 19 https://blog.lexicata.com/wp-content/uploads/2015/03/platform-model- 750x410.png
  20. 20. If platforms are the answer we could ask the question… Will biomedical research become more like Airbnb? 04/08/18 AIMBE Academic Council 20 Vivien Bonazzi Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818
  21. 21. I am not crazy, hear me out • Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host) • The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder • It seems to be working: – 60 million users searching 2 million listings in 192 countries – Average of 500,000 stays per night. – Evaluation of US $25bn 04/08/18 AIMBE Academic Council 21 Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818
  22. 22. Platforms will ultimately digitally integrate the scholarly workflow for human and machine analysis Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818 AIMBE Academic Council 2204/08/18
  23. 23. Paper Author Paper Reader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Academia.edu Open Science Framework Synapse F1000 Rio Educator Student Pilot Open Data Lab (ODL) underway AIMBE Academic Council 23gDOC04/08/18
  24. 24. Why a comparison to Airbnb is not fair • Airbnb was born digital • The exchange of services on Airbnb are simple compared to what is required of a platform to support biomedical research Nevertheless there is much to be learnt 04/08/18 AIMBE Academic Council 24
  25. 25. Impediments to a biomedical platform • Current work practices by all stakeholders • Entrenched business models • Size of the undertaking aka resources needed • Trust • Incentives to use the platform http://www.forbes.com/sites/johnhall/2013/04/29/1 0-barriers-to-employee-innovation/#8bdbaa811133 04/08/18 AIMBE Academic Council 25
  26. 26. Such platforms combined with emerging analytics will likely have significant impact on biomedical engineering 04/08/18 AIMBE Academic Council 26
  27. 27. Machine learning has been around for over 20 years – why now? • Amount of data available for training • Open source - R and python • Advances in computing (e.g., GPU’s) allow for deeper neural nets (deep learning) • Algorithmic efficiency gains (e.g., in back propagation) • Success promotes further research • Commercialization 04/08/18 AIMBE Academic Council 27 Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
  28. 28. Let me touch on our research in protein engineering oh so briefly…. 04/08/18 AIMBE Academic Council 28 Structural Biology Meets Data Science – Does Anything Change? Crowd Source: Current Opinions in Structural Biology 2018 https://docs.google.com/document/d/1rD3Qh1btTYlnGkKefN GSFVq8v_mqRNa8I0o5MP3ZMW4/edit
  29. 29. Are their new scaffolds out there Nature has yet to discover that AI could? There are ~ 20300 possible proteins >>>> all the atoms in the Universe 96M protein sequences from 73,000 species (source RefSeq) 135,000 protein structures yield 1221 folds (SCOPe 2.06) AIMBE Academic Council 2904/08/18
  30. 30. AIMBE Academic Council 30 At DeepMind, which is based in London, AlphaGo Zero is working out how proteins fold, a massive scientific challenge that could give drug discovery a sorely needed shot in the arm. 04/08/18
  31. 31. 04/08/18 AIMBE Academic Council 31 http://cartertoons.com/
  32. 32. How should academic institutions think about exploiting data science? 04/08/18 AIMBE Academic Council 32
  33. 33. Organization: core data science verticals AIMBE Academic Council 33 Data Integration & Engineering Machine Learning & Analytics Visualization & Dissemination Data Acquisition Ethics, Law, Policy, Social Implications 04/08/18
  34. 34. Organization: interdisciplinary horizontals AIMBE Academic Council 34 Data Integration & Engineering Machine Learning & Analytics Visualization Data Acquisition & Dissemination Ethics, Law, Policy, Social Implications Biomedical Engineering 04/08/18
  35. 35. Data Acquisition • Sensors • Nanotechnology • Imaging • Unexpected sources e.g., DMV AIMBE Academic Council 35gDOC04/08/18
  36. 36. Data Integration and Engineering • Ontologies • Object identifiers • Indexing schemes • Common data models AIMBE Academic Council 36gDOC04/08/18
  37. 37. Biomedical: Machine Learning & Analytics • Neural nets • Deep learning • Natural Language Processing (NLP) • Gene expression & neurological disease (Kipnis) • Predicting opioid overdose (VA Health) • Predicting escalating care and mortality risk of cirrhosis patients (UVA HS) • Human microbiome & mental health in maternal health (Psychology & Nursing) AIMBE Academic Council 37gDOC04/08/18
  38. 38. Biomedical: Visualization • Virtual Reality (VR) • Networks • Sonics • Visualizing microbial stability (Biology & Systems) AIMBE Academic Council 38gDOC04/08/18
  39. 39. Ethics, Law, Policy & Social Implications • Data sharing • Privacy • Normativity AIMBE Academic Council 39gDOC Wendy Novicoff, Ph.D 04/08/18
  40. 40. Conclusion: Driven by large amounts of open digital data of different types and new algorithms and approaches biomedical researchers are destined to follow the private sector towards the fourth paradigm 04/08/18 AIMBE Academic Council 40
  41. 41. Acknowledgements 04/08/18 AIMBE Academic Council 41 The BD2K Team at NIH My Colleagues at UVA The 150 folks who have passed through my laboratory https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0
  42. 42. Thank You peb6a@virginia.edu 4204/08/18 AIMBE Academic Council