Pace of technology innovation, changes in publication, separating data generation from publishing insights. Given at the 2018 VIVO conference at Duke University.
Scientific Publication, Data
Sharing, Learning Health and
Incentives
Warren A. Kibbe, Ph.D.
Professor, Biostats & Bioinformatics
Chief Data Officer, Duke Cancer Institute
warren.kibbe@duke.edu
@wakibbe #VIVO18
Take homes
• Pace of consumer computing
• Data generation is no longer the
bottleneck
• 4th Industrial Revolution is here
• Biomedical research and medicine is
a data enterprise
• Data sharing as an accelerant
Science has evolved
• The role of societies and journals
have slowly evolved during the past
150 years
• Our ability to generate data has
rapidly evolved in the past 40 years
• Sharing knowledge, primary data,
observations are throttled by our
current publication process
How did we get here
• A slight digression taking a narrow
slice of history, technology, and
science
First Industrial Revolution
Humans having access to cheap energy
to do work (steam)
William M. Connolley - Picture of the "Puffing Billy" steam
engine taken in the Science Museum. on 2004/03/13.
https://en.wikipedia.org/wiki/Steam_engine
taken May 2018
First Industrial Revolution
In addition to changing manufacturing
and transportation, steam changed
printing
Meggs, Philip B. A History of Graphic Design. John Wiley & Sons, Inc. 1998. (p 132)
Scientific Publication 1.0
Journals, primarily
the output of
scientific societies,
began in the 1860s.
Peer review became
the norm after WWII
First issue of Nature, 1869
Second Industrial Revolution
Mass production, better materials
(steel) and manufacturing, distribution
of energy using electricity & petroleum
Robert Friedrich Stieler (1847–1908) - alte Postkarte, https://www.basf.com/de/company/about-us/history/1865-1901.html
Third Industrial Revolution
The Digital Revolution
2008-03-19 21:41 Transisto from Wikipedia
10 August 2016 Thomas Nguyen - Wikipedia
Mike1024 from wikipedia - University of
Warwick 2006
Fourth Industrial Revolution
Industry 4.0
• Communications
• Connectivity
• Ubiquituous
• Pervasive
• Internet of Things
• Embedded Sensors
• Process Automation
• Cloud Computing
Mass access to data generation, processing, visualization
Impact on Biomedical Research
Challenges and Opportunities
– Workforce!
– Ethics!
– Data management!
– New instrumentation & tech
– Computing, Analytics, Visualization,
Usability – Data Science
Biomedical research is a data driven
enterprise
Biomedical Science 2.0
Workforce
– Need more Quantitative Scientists
– Biologically aware Data Scientists
– Data aware researchers
– Formal and informal training
– Incentives for team science
– Professional recognition
Everyone in biomedical research and in
medicine needs to be ‘Data Savvy’
Biomedical Science 2.0
Data Management
– Data provenance
– Validated devices
– Credit for individual &
team contributions
– Persistence & immutability
– DOIs
– Reproducible workflows
Cloud computing and data commons have
many of the features we need
Biomedical Science 2.0
Some features of 2.0
• EHRs are now deployed
• Smart devices nearly ubiquitous
• Broadband a ‘human right’
• Patient experience opportunities
• Ability to scale research
Data management is crucial
Biomedical Science 2.0
• Instrumentation
– Next Gen Sequencing
– Mass Spec (proteomics, metabolomics)
– Digital Imaging (Pathology, Radiology)
– High Throughput Screening
– Sensors
Massive reduction in the cost of
generating datasets
Biomedical Science 2.0
• Computing & Data Science
• Modeling at many size and time scales
• Causal inference
• Width vs Depth
• Complex vs simple relationships
• Usability
• Visualization and Human Cognition
Deep learning, networks, mechanism,
prediction, testing and validation
Understanding Cancer
• Precision medicine will lead to fundamental
understanding of the complex interplay between
genetics, epigenetics, nutrition, environment and
clinical presentation and direct effective,
evidence-based prevention and treatment.
Ramifications across many aspects of health care
This change has been driven by improved technology - sequencing, imaging,
nanotech, drug developing, computing and the availability of data about
patient response to therapy
Scientific publication 2.0
• Journals are fully digital
• Work flow automated
• Open dissemination after embargo
• Open Access still not the norm
• Role of datasets and data publication
still in flux
• Data sharing of primary data still not
the norm
Scientific publication 2.0
• Observations (data!) are
accumulating at a rapid pace
• Insights, information, analytics, and
knowledge follow and conform to
more classic versions of peer review
and publication
• IMO - We need to separate data
sharing from knowledge sharing
Validation
• Validation and Harmonization of
primary and secondary data is
crucial, but does not need to be done
through the current publication
process
Data Sharing Index
• We need metrics for data, software,
algorithm use, usability, conformance
• FAIR!
NIH
• NIH Strategic Plan for Data Science
https://datascience.nih.gov/sites/default/files
/NIH_Strategic_Plan_for_Data_Science_Final
_508.pdf
• Written by the NIH Scientific Data
Council
• About data management and analytics
• Misses the change from data generation
to data analytics
The printing press enabled social change through widely disseminating print – this increased the value of literacy and made censorship of information harder.
For the first time people living in a society have access to much more than just the amount of work that they can do with their muscles or with domesticated animals. This opened up the possibility of creating machines and machinery at a very different scale. It also started to improve transportation
Semiconductors, VLSI, minturization, hardware and software, transition from analog to digital devices
But people can make effective decisions on the same number of factors…
How can we use machine learning and other techniques to reduce cognitive overload?
+ve –ve protein expression levels, ALK- Anaplastic lymphoma kinase, Squamous is a cell type (epidermoid),