1. Open Data in a Global Ecosystem
Philip E. Bourne Ph.D., FACMI
Associate Director for Data Science
National Institutes of Health
philip.bourne@nih.gov
BioMedBridges, EBI, November 17, 2015
http://www.slideshare.net/pebourne
4. Perspective
Structural bioinformatics researcher
Former custodian of the RCSB PDB
Obsessive about open science e.g., PLOS
NIH-wide responsibility for developments in
data science
6. The History of Computational
Biomedicine According to Bourne
1980s 1990s 2000s 2010s 2020
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver
The Raw Material:
Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated
The People:
No name Technicians Industry recognition data scientists Academics
Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol
7. It Follows …
We are entering a period of disruption
in biomedical research and we should
all be thinking about what this means
to bioinformatics & biomedicine
http://i1.wp.com/chisconsult.com/wp-
content/uploads/2013/05/disruption-is-a-
process.jpg
http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg
8. Big Data in Biomedicine…
This speaks to something more
fundamental that more data …
It speaks to new methodologies, new
skills, new emphasis, new cultures,
new modes of discovery …
9. We are at a Point of Deception …
Evidence:
– Google car
– 3D printers
– Waze
– Robotics
– Sensors
From: The Second Machine Age: Work, Progress,
and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson & Andrew McAfee
10. Disruption: Example - Photography
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
11. Disruption: Biomedical Research
Digitization of Basic &
Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open science
Patient centered health care
14. “And that’s why we’re here today. Because something
called precision medicine … gives us one of the greatest
opportunities for new medical breakthroughs that we
have ever seen.”
President Barack Obama
January 30, 2015
Disruptive Features – New Science
15. Precision Medicine Initiative
National Research Cohort
– >1 million U.S. volunteers
– Numerous existing cohorts (many funded by NIH)
– New volunteers
Participants will be centrally involved in design and
implementation of the cohort
They will be able to share genomic data, lifestyle
information, biological samples – all linked to their
electronic health records
16. What Are Some General Implications
of Such a Future?
Open collaborative science becomes of increasing
importance nationally and internationally
The value of data and associated analytics becomes
of increasing value to scholarship
Opportunities exist to improve the efficiency of the
research enterprise and hence fund more research
Global cooperation between funders will be needed
to sustain the emergent digital enterprise
Current training content and modalities will not match
supply to demand
Balancing accessibility vs security becomes more
important yet more complex
17. What Are Some General Implications
of Such a Future?
Open collaborative science becomes of increasing
importance nationally and internationally
The value of data and associated analytics becomes
of increasing value to scholarship
Opportunities exist to improve the efficiency of the
research enterprise and hence fund more research
Global cooperation between funders will be needed
to sustain the emergent digital enterprise
Current training content and modalities will not match
supply to demand
Balancing accessibility vs security becomes more
important yet more complex
18. How Should We Respond as Funders?
Community:
– Encourage wherever possible a global cultural shift towards
open science
– Encourage global exchanges
– Encourage global projects
Policies:
– Understand and map data sharing policies, standards etc.
– Understand ethical, legal and societal differences
Infrastructure:
– Share the burden and the reward
19. How Should We Respond as Funders?
Community:
– Encourage wherever possible a global cultural shift towards
open science
– Encourage global exchanges
Policies:
– Understand and map data sharing policies, standards etc.
– Understand ethical, legal and societal differences
Infrastructure:
– Share the burden and the reward
21. A Culture of Sharing
1999 20042003 2007 20142008
Research
Tools
Policy
NIH Data
Sharing Policy
Model
Organism
Policy
Genome-wide
Association
(GWAS) Policy
2012
NIH Public
Access Policy
(Publications)
Big Data to
Knowledge
(BD2K) Initiative
Genomic Data
Sharing (GDS)
Policy
Modernization of
NIH Clinical
Trials
White House
Initiative
(2013 “Holdren
Memo”)
25. How Should We Respond as Funders?
Community:
– Encourage wherever possible a global cultural shift towards
open science
– Encourage global exchanges
– Encourage global projects
Policies:
– Understand and map data sharing policies, standards etc.
– Understand ethical, legal and societal differences
Infrastructure:
– Share the burden and the reward
26. The Commons is a shared virtual space which is
FAIR:
– Find
– Access (use effectively)
– Interoperate
– Reuse
An environment to find and catalyze the use of
shared digital research objects
The Commons
Concept
27. The Developer or User Defines the
Environment from the Appropriate
Building Blocks
30. Public Beacons
Host Content
AMPLab 1000 Genomes Project
Broad Institute ExAC
Curoverse PGP, GA4GH Example Data
EBI
1000 Genomes Project, UK10K, GoNL, EVS,
GEUVADIS, UMCG Cardio GenePanel
Google
1000 Genomes Project, Phase III, Illumina Platinum
Genomes
ISB Known VARiants
NCBI NHLBI Exome Sequence Project
OICR 55 cancer datasets
SolveBio 56 public datasets
UCSC ClinVar, LOVD, UniProt
University of Leicester Cafe CardioKit, Cafe Variome Central
WTSI IBD, Native American, Egyptian, UK10K
Over 120 public datasets beaconized across 21 institutions
10s thousands of individuals
31.
32. Commons - Pilots
The Cloud Credits - business model
BD2K Centers
MODs (Model Organism Databases)
HMP Data and tools available in the cloud
NCI Cloud Pilots & Genomic Data
Commons
33. I not only use all the brains
I have, but all I can borrow.
– Woodrow Wilson
34. What Can We Do Now?
Extend the research pilots
concept
Have TCC & TeSS work
together
Global hackathons,
competitions
Closer ties between NLM and
EBI / Elixir
Student exchanges
Engage foundations, charities
in more global initiatives
http://wwwdev.ebi.ac.uk/Tools/ddi/
36. NIHNIH……
Turning Discovery Into HealthTurning Discovery Into Health
philip.bourne@nih.gov
https://datascience.nih.gov/
http://www.ncbi.nlm.nih.gov/research/staff/bourne/
Notas do Editor
Photos: FC tweet; RK screen grab
Images of people from Infographic (NOTE: Image is just a placeholder—Jill will tweak)
Detailed Notes:
National Research Cohort <<OR name of study>>
>1 million U.S. volunteers committed to participating in research
Will combine a number of existing cohorts
Will include Dept of Veterans Affairs Million Veteran Program—note Veteran is singular per http://www.research.va.gov/MVP/
on this slide we have a list of Beacon providers and the content that they're serving. so to date we have over 120 public datasets that have been made available via Beacons at 12 different institutions. So this represents data from 10s of thousands of individuals and theses metrics, the numbers of datasets and individuals that they represent