1. Towards a Platform for Global
Health
Philip E. Bourne, PhD, FACMI
Associate Director for Data Science
The National Institutes of Health
http://www.slideshare.net/pebourne
philip.bourne@nih.gov
2. Bias
• Worked on long standing data resources
– PDB, IEDB
• Systems pharmacology with emphasis
on the role of molecular structure
• AVC for innovation and industrial
alliances at UCSD
• Chief data officer for the National
Institutes of Health
• Open science zealot
https://www.boundless.com/psychology/textbooks/boundless-psychology-textbook/researching-psychology-2/bias-in-
psychological-research-407/biases-in-experimental-design-validity-reliability-and-other-issues-132-12667/images/research-
bias/
3. Before we look at platforms .. and thinking as a
funder .. I want to describe an emergent effort
that may have some valuable lessons for GA4GH
going forward in their relationship to funders
(something that should not be ignored)
Preprints
http://www.hdimagez.com/gifts-made-of-moneywallpapers/
4. What is a preprint?
• A complete manuscript/research report shared prior to/instead of
publication – in ArXiv 80% of preprints get published at a later date
• Not formally peer reviewed but may be commented on by the
community – depends on the preprint service
5. http://asapbio.org/
Preprints – Long the realm
of physicists are gaining
traction in the life sciences
Speeds up
dissemination
Record of priority
More informed
grant review
Negative data
✘ Fear of scooping
✘ Career
disadvantage
✘ Inability to publish
✘ Quality:
Moderation only;
no peer review
6. Status
• ASAPbio to issue RFI for what a central preprint service should look
like
• ~15 global funders (government and foundations) – the coalition of
the willing – defined basic principles to support such a service
• Collectively expect to fund ASAPbio to award a contract to build the
system
• While sustainability models should be sought, funders anticipate
funding a central service for 5-10 years at least
Endpoint
• Accelerated scientific outcomes through a human and machine
accessible corpus of open knowledge accessible to all
8. Perceived critical mission
Strong leadership
Leading scientists engaged
Significant community support
✖Obvious endpoint/singular
message
✖Funders - coalition of the willing
✖Identified champions within
each funding body
http://asapbio.org/
9. Obvious endpoint/singular message
Possible Touchpoint to Funders:
“The partners in the Global Alliance are working together
to create a common framework of harmonized
approaches to enable the responsible, voluntary, and
secure sharing of genomic and clinical data.”
10. Funders too are increasingly looking at moving
from pipes to platforms (aka common
framework)..
What would such a platform look-like? …
Sangeet Paul Chowdry
http://platformthinkinglabs.com/start-here/
11. Making Biomedical Research
More Like Airbnb
Philip E. Bourne, PhD, FACMI
Associate Director for Data Science
The National Institutes of Health
http://www.slideshare.net/pebourne
philip.bourne@nh.gov
12. I am not crazy, hear me out
• Airbnb is a platform that supports a trusted relationship between
consumer (renter) and supplier (host)
• The platform focuses on maximizing the exchange of services
between supplier and consumer and maximizing the amount of trust
associated with a given stakeholder
• It seems to be working:
• 60 million users searching 2 million listings in 192 countries
• Average of 500,000 stays per night.
• Evaluation of US $25bn
14. Why a comparison to Airbnb is not fair
• Airbnb was born digital
• The exchange of services on Airbnb are simple
compared to what is required of a platform to support
biomedical research
Nevertheless there is much to be learnt
16. Author Submission via the Web Depositor Submission via the Web
Syntax Checking Syntax Checking
Review by Scientists &
Editors
Review by Annotators
Corrections by Author
Corrections by Depositor
Publish – Web Accessible Release – Web Accessible
Similar Processes Lead to Similar Resources
Bourne, PLoS Comp. Biol. 2005 1(3) e34de Waard Nature Proceedings 2010 10101/npre.2010.4742.1
17. What is different is the perceived value of each to
the research enterprise. That value difference is
diminishing in part because of openness,
accessibility, policy, governance, increased data
reuse and lets not forget other forms of madness…
18. The Analog-Digital Data Knowledge Cycle
P.E. Bourne, 2016, There is No Intelligent Life Down There
20. In summary there is not currently a widely
adopted single platform for the exchange of
services in biomedical research. Either there is a
platform per service or no platform at all. Why
have we not done better and what are the
impediments today?
21. Impediments to a biomedical platform
• Current work practices by all stakeholders
• Entrenched business models
• Size of the undertaking aka resources needed
• Trust
• Incentives to use the platform
http://www.forbes.com/sites/johnhall/2013/04/29/10-barriers-to-
employee-innovation/#8bdbaa811133
22. The NIH through the Big Data to Knowledge
(BD2K) is experimenting with a platform, keeping
in mind the need to overcome these impediments
Enter The Commons
https://en.wikipedia.org/wiki/Ealing_Common#/media/File:Eali
ng_Common_-_geograph.org.uk_-_17075.jpg
24. Commons topology
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
25. Commons compliance
• Treat products of research – data, methods,
papers etc. as digital objects
• These digital objects exist in a shared virtual
space
• Digital object compliance through FAIR
principles:
• Findable
• Accessible (and usable)
• Interoperable
• Reusable The FAIR Principles
http://www.nature.com/articles/sdata201618
26. NIH + Community
defined data sets
possible FOAs and CCM
BD2K Centers, MODS,
HMP &
Interoperability
Supplements
Cloud credits model
(CCM)
BioCADDIE/Other
Indexing
NCI &
NIAID
Cloud
Pilots
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
Mapping current BD2K activities to the commons topology
https://datascience.nih.gov/commons
27. Prediction – funder activities are about to
accelerate and here lies the opportunity….
What are the funder incentives? ….
28. Incentives
• Airbnb
• Monetize unutilized space
• Ease of use
• New vacation experience
• Commons
• Need to improve rigor and
reproducibility
• Productivity
• Sustainability
• Education and training
• Opportunity to undertake elastic
compute on large complex data
29. NIH committed to (and would hope other
funders will join)
• The Commons and the FAIR principles
• Pilots that test the feasibility of the platform for larger scale
development/adoption
• Provision of two large complex data sets in the Commons – TOPMed
and GTEx are obvious choices, others may surface
• Use cases that illustrate the feasibility and scientific value of:
• Access to a single data source
• Interoperability across data sources
30. Summary
• NIH has endorsed the Commons and the FAIR principles
• The Commons is the beginnings of a platform from which to conduct
biomedical research
• Over the next 1-2 years we are conducting pilots to evaluate the
feasibility of the Commons
• If feasible the intent is to expand into additional layers of the
scholarly research lifecycle
• The global reach of GA4GH can foster a coalition of the willing
• Commons applications are an opportunity to provide a singular
message
31. “I really admire Airbnb as a pioneer of the sharing
economy and for building community. They've
found an elegant way to help hosts make more
money and for guests to have authentic
experiences. It brings those people together in a
unique way. “
Logan Green
32. “The Commons is an effort at creating a sharing
economy and for building community. We hope
for a more cost effective and productive research
environment while bringing people together in a
unique way. “
Phil Bourne
33. Speaking of a shared economy…
You are invited to contribute to a shared
document that describes this concept..
You will be acknowledged and the document put
forward for NIH clearance to be
blogged/preprinted/published….
http://tinyurl.com/hc4td5b
34. Acknowledgements
• ADDS Office: Vivien Bonazzi, Jennie Larkin, Michelle Dunn, Mark Guyer, Allen Dearry, Sonynka Ngosso,
Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS)
• NCBI: George Komatsoulis
• NHGRI: Valentina di Francesco
• NIGMS: Susan Gregurick
• CIT: Debbie Sinmao, Andrea Norris
• NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr
• NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen
• Commons Reference Data Set Working Group: Weiniu Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean
Davis (NCI), Vinay Pai (NIBIB), Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI)
• RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI),
Claire Schulkey (AI), Eric Choi (AI)
• OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke,
35. NIH…
Turning Discovery Into Health
philip.bourne@nih.gov
https://datascience.nih.gov/
http://www.ncbi.nlm.nih.gov/research/staff/bourne/