SlideShare uma empresa Scribd logo
1 de 53
Big Data & Analytics Across the
Interdisciplinary Divide
Philip E. Bourne PhD, FACMI
Stephenson Chair of Data Science
Director, Data Science Institute
Professor of Biomedical Engineering
peb6a@virginia.edu
https://www.slideshare.net/pebourne
12/17/18 BigDIA 1
@pebourne
Perspective
• I was not trained as a data scientist or computer scientist - I
started as a physical chemist
• At this point I can’t give you a deep technical perspective
• My examples are taken from biomedicine, but broadly
applicable
• Deeply engaged in preparing one academic institution for a very
different data driven interdisciplinary future
12/17/18 BigDIA 2
My motivation
The biggest gains for our society are going to come
through interdisciplinary research where data and
analytics catalyze the collaboration
12/17/18 BigDIA 3
Consider a wake up call of sorts
12/17/18 BigDIA 4
A wake up call of sorts
12/17/18 BigDIA 5
https://www.sciencemag.org/news/2018/12/google-s-deepmind-aces-protein-folding
https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp13-what-just-happened/
Data as driver
12/17/18 BigDIA 6
https://www.ebi.ac.uk/uniprot/TrEMBLstats
Contents of the Protein Data Bank
This is a somewhat predictable outcome..
The real excitement comes from the unexpected …
Witness the tale of the trauma surgeon …
12/17/18 BigDIA 7
But there is more…
Air pollution-ecosystem feedback: unmanned
aerial vehicles and ecosystem models to
quantify ozone-forest interactions
12/17/18 BigDIA 8
• Spatial heterogeneity
• Novel sampling
• Senor data
Departments:
Environmental Sciences
Electrical Engineering
A working definition of what we are doing …
It is the unexpected re-use of information which is
the value added by the web
Tim Berners-Lee
12/17/18 BigDIA 9
https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#116a5a2d55cf
A working definition of what we are doing …
It is the unexpected re-use of information which is
the value added by the web and subsequent
analysis of that information for societal benefit
Tim Berners-Lee / Phil Bourne
12/17/18 BigDIA 10
https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#116a5a2d55cf
Of course this was all predicted by smart people ..
12/17/18 BigDIA 11
12
https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)
https://www.microsoft.com/en-us/research/wp-
content/uploads/2009/10/Fourth_Paradigm.pdf
https://twitter.com/aip_publishing/status/856825353645559808
12/17/18 BigDIA
I would suggest that this audience has a
responsibility to promote the fourth paradigm
which is not a well recognized phenomenon across
disciplines …
Here is one example of how to do so
12/17/18 BigDIA 13
How Will Science Change?
1412/17/18 BigDIA
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
From a presentation to the Advisory Board to the NIH Director
Example - Photography
1512/17/18 BigDIA
Model
Transportability
Horizontal
Integration
Multi-scale
Integration
human
mouse
zebrafish
DNA
Gene/Protein
Network
Cell
Tissue
Organ
Body
Population
CNV SNP methylation
3D structure Gene
expression Proteomics
Metabolomics
MetabolicSignaling
transduction
Gene
regulation
Hepatic Myoepithelial Erythrocyte
Epithelial Muscle Nervous
Liver Kidney Pancreas Heart
Physiologically based
pharmacokinetics
GWASPopulation
dynamics
Microbiota
Open, complex, diverse digital data
Systems Pharmacology
Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262
12/17/18
16
BigDIA
How should we think about organizing ourselves in
an interdisciplinary way to maximize the
opportunities offered by the fourth paradigm?
12/17/18 BigDIA 17
The Pillars of Data Science
18
Application Domains
12/17/18 BigDIA
Lets briefly focus on those five pillars
in the context of one area of
biomedical informatics – structural
bioinformatics
What kinds of interchange should be
taking place between this field and
data science?
12/17/18 BigDIA 19
Mura et al. 2018 Curr Opin Struct Biol. 52:95-102
Data Acquisition
• Persistence of raw data not clear
• Some level of consistency across instrument manufacturers
• Lessons in community/society drive
12/17/18 BigDIA 20
Mura et al. 2018 Curr Opin Struct Biol. 52:95-102
Data Integration and Engineering
• URI’s no - stooped in tradition
• Ontologies – somewhat
• Linked data - somewhat
2112/17/18 BigDIA
Years of experience to convey
Data Analytics
22
–SVM’s
–Random forest
–Neural nets
–Deep learning
–??
12/17/18 BigDIA
Opportunity to learn from many domains
Visualization & Dissemination
• Avoid the curse of the
ribbon
• Think sonics
• Look to video games
2312/17/18 BigDIA
Ethics, Law & Policy –
Community Driven Data Sharing
12/17/18 BigDIA 24
How to implement this at any level?
12/17/18 BigDIA 25
Guiding Principles
• Be constantly strategic and nimble - think supply chain
• Be sustainable - do not over reach
• Be interdisciplinary
• Be a organization without walls
• Be diverse, accessible and open
• Be team not individually driven
• Strive for quality not quantity in education & research
• Be innovative and translational through new forms of engagement with
the private sector, government, NGOs, local, state, national and
international partners
2612/17/18 BigDIA
Guiding Principles
• Be constantly strategic and nimble - think supply chain
• Be sustainable - do not over reach
• Be interdisciplinary
• Be a organization without walls
• Be diverse, accessible and open
• Be team not individually driven
• Strive for quality not quantity in education & research
• Be innovative and translational through new forms of engagement with
the private sector, government, NGOs, local, state, national and
international partners
2712/17/18 BigDIA
Be Interdisciplinary – Be Without Walls
• Satellites – discipline driven - located in another School
focusing on the mission of that School where data and
analytics play a role, e.g.,
– SOM – data governance and clinical translation
– Education – working on educational analytics
• Centers – Focus area driven e.g.
– Ethics and justice
– Neurodegenerative disorders – Alzheimer's, autism, TBI
– Sports analytics
2812/17/18 BigDIA
Guiding Principles
• Be constantly strategic and nimble - think supply chain
• Be sustainable - do not over reach
• Be interdisciplinary
• Be a organization without walls
• Be diverse, accessible and open
• Be team not individually driven
• Strive for quality not quantity in education & research
• Be innovative and translational through new forms of engagement with
the private sector, government, NGOs, local, state, national and
international partners
2912/17/18 BigDIA
Be Diverse, Accessible and Open – Why?
• Data science exists largely because of open data
• Open knowledge encourages disciplinary and interdisciplinary
collaboration
• Yet much of the scholarship we produce is not accessible at all and
certainly not accessible to socioeconomically disadvantaged groups
• Gouging by commercial knowledge providers is making the
knowledge produced by others less accessible to us
• Research is suffering from a reproducibility crisis addressable
through greater access to all aspects of the research lifecycle
3012/17/18 BigDIA
Be Diverse, Accessible and Open – Why?
Consider Biomedicine
• Big Data
– Total data from NIH-funded research back in 2016 estimated at 650
PB*
– 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10
PB in 2016
• Dark Data
– Only 12% of data described in published papers is in recognized
archives – 88% is dark data^
• Cost
– 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data
archives * In 2012 Library of Congress was 3 PB
^ http://www.ncbi.nlm.nih.gov/pubmed/26207759
12/17/18 BigDIA 31
A call for making these data open
• Mandates
– NIH, NSF, Data Management Plans
• Business models can be
protected yet everyone benefits
• It saves lives ….
12/17/18 BigDIA 32
Why a more open process?
Use case:
Diffuse Intrinsic Pontine Gliomas (DIPG)
• Occur 1:100,000
individuals
• Peak incidence 6-8 years
of age
• Median survival 9-12
months
• Surgery is not an option
• Chemotherapy ineffective
and radiotherapy only
transitive
From Adam Resnick12/17/18 BigDIA 33
Timeline of genomic studies in DIPG
• Landmark studies identify
histone mutations as
recurrent driver mutations in
DIPG ~2012
• Almost 3 years later, in
largely the same datasets,
but partially expanded, the
same two groups and 2
others identify ACVR1
mutations as a secondary,
co-occurring mutation
From Adam Resnick
12/17/18 BigDIA 34
What do we need to do differently
to reveal ACVR1?
• ACVR1 is a targetable kinase
• Inhibition of ACVR1 inhibited tumor
progression in vitro
• ~300 DIPG patients a year
• ~60 are predicted to have ACVR1
• If large scale data sets were only
integrated with TCGA and/or rare
disease data in 2012, ACVR1 mutations
would have been identified
• 60 patients/year X 3 years = 180
children’s lives (who likely succumbed
to the disease during that time) could
have been impacted if only data were
FAIR
From Adam Resnick
12/17/18 BigDIA 35
Research Data Infrastructure …
Both funders and some institutions
see the need to move from pipes to
platforms to accelerate research…
12/17/18 BigDIA 36
https://blog.lexicata.com/wp-content/uploads/2015/03/platform-model-
750x410.png
If platforms are the answer we could
ask the question…
Will {biomedical} research become
more like Airbnb?
12/17/18 BigDIA 37
Vivien Bonazzi
Should biomedical research be Like Airbnb?
doi: 10.1371/journal.pbio.2001818
I am not crazy, hear me out
• Airbnb is a platform that supports a trusted relationship between consumer
(renter) and supplier (host)
• The platform focuses on maximizing the exchange of services between supplier and
consumer and maximizing the amount of trust associated with a given stakeholder
• It seems to be working:
– 60 million users searching 2 million listings in 192 countries
– Average of 500,000 stays per night.
– Evaluation of US $25bn
12/17/18 BigDIA 38
Should biomedical research be Like Airbnb?
doi: 10.1371/journal.pbio.2001818
Platforms will ultimately digitally
integrate the scholarly workflow for
human and machine analysis
Should biomedical research be Like Airbnb?
doi: 10.1371/journal.pbio.2001818
BigDIA 3912/17/18
Paper Author Paper Reader
Data Provider Data Consumer
Employer Employee
Reagent Provider Reagent Consumer
Software Provider Software Consumer
Grant Writer Grant Reviewer
Supplier Consumer Platform
MS Project
Google Drive
Coursera
Researchgate
Academia.edu
Open Science
Framework
Synapse
F1000
Rio
Educator Student
Pilot Open Data Lab
(ODL) underway
BigDIA 4012/17/18
The NIH through the Big Data to Knowledge
(BD2K) is experimenting with a platform,
keeping in mind the need to overcome these
impediments
Enter The Commons
https://en.wikipedia.org/wiki/Ealing_Common
#/media/File:Ealing_Common_-
_geograph.org.uk_-_17075.jpg12/17/18 BigDIA 41
Paper Author Paper Reader
Data Provider Data Consumer
Employer Employee
Reagent
Provider
Reagent
Consumer
Software
Provider
Software
Consumer
Grant Writer Grant Reviewer
Supplier Consumer Platform
MS Project
Google Drive
Coursera
Researchgate
Academia.edu
Open Science
Framework
Synapse
F1000
Rio
Educator Student
Commons –
Initial focus is on integrating two
layers of the scholarly workflow
12/17/18 BigDIA 42
Commons topology
Compute Platform: Cloud or HPC
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data
“Reference” Data Sets
User defined data
DigitalObjectCompliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
12/17/18 BigDIA 43
Commons Compliance
• Treat products of research – data,
methods, papers etc. as digital objects
• These digital objects exist in a shared
virtual space
• Digital object compliance through FAIR
principles:
– Findable
– Accessible (and usable)
– Interoperable
– Reusable
https://commonfund.nih.gov/bd2k/commons
12/17/18 BigDIA 44
Why a comparison to Airbnb is not fair
• Airbnb was born digital
• The exchange of services on Airbnb are
simple compared to what is required of a
platform to support biomedical research
Nevertheless there is much to be
learnt
12/17/18 BigDIA 45
Impediments to platforms
• Current work practices by all stakeholders
• Entrenched business models
• Size of the undertaking aka resources
needed
• Trust
• Incentives to use the platform
http://www.forbes.com/sites/johnhall/2013/04/29/
10-barriers-to-employee-
innovation/#8bdbaa811133
12/17/18 BigDIA 46
Even if they are successful, platforms are likely to be
domain specific and only address the
infrastructure..
What else is needed?
12/17/18 BigDIA 47
We need to promote openness
• Encourage persistent identifiers e.g., ORCID
• Encourage preprints
• Encourage Open Access (OA)
• Recognize openness in hiring and P&T
• Teach open scholarship
• Promote institutional openness – repositories, wikimedian in
residence
• Support institutional open data governance
• Support global community efforts….
12/17/18 BigDIA 48
Wikidata – fast growing
12/17/18 BigDIA 49
• Get on board with developments in schema.org, knowledge
graphs, etc… as part of the rule rather than the exception
• Provide metadata and opinion for data we produce or use
Let me summarize:
How do we address the interdisciplinary divide?
• Promote the fourth paradigm
• Work within your institutions to promote data science as an
interdisciplinary field
• Establish an open and integrated environment for data and
analytics
• Be patient and do not oversell …
12/17/18 BigDIA 50
12/17/18 BigDIA 51
Haas & Schmidt 2018
http://iswc2018.semanticweb.org/workshops-tutorials/#ekg
Acknowledgements
12/17/18 BigDIA 52
The BD2K Team at NIH
The 150 folks who have passed through my laboratory
https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0
Thank You
peb6a@virginia.edu
5312/17/18 BigDIA

Mais conteúdo relacionado

Semelhante a Big Data and Analytics Across the Interdisciplinary Divide

Data Science Meets Academia - What Comes Next?
Data Science Meets Academia - What Comes Next?Data Science Meets Academia - What Comes Next?
Data Science Meets Academia - What Comes Next?Philip Bourne
 
Implications of the Fourth Paradigm
Implications of the Fourth ParadigmImplications of the Fourth Paradigm
Implications of the Fourth ParadigmPhilip Bourne
 
GODAN presentation at the 42nd APAN meeting
GODAN presentation at the 42nd APAN meetingGODAN presentation at the 42nd APAN meeting
GODAN presentation at the 42nd APAN meetingJohannes Keizer
 
What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?Philip Bourne
 
What Is It Going To Cost And What Is In It For Me?
What Is It Going To Cost And What Is In It For Me?What Is It Going To Cost And What Is In It For Me?
What Is It Going To Cost And What Is In It For Me?Philip Bourne
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...African Open Science Platform
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonAfrican Open Science Platform
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global EcosystemPhilip Bourne
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesMartin Donnelly
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data DATAVERSITY
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data Blueprint
 
Are Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedAre Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedPhilip Bourne
 
Current Disruptions in Media: Earthquakes or New Openings? Stanford as Catalyst
Current Disruptions in Media: Earthquakes or New Openings? Stanford as CatalystCurrent Disruptions in Media: Earthquakes or New Openings? Stanford as Catalyst
Current Disruptions in Media: Earthquakes or New Openings? Stanford as CatalystMartha Russell
 
Data Science Meets Structural Biology
Data Science Meets Structural BiologyData Science Meets Structural Biology
Data Science Meets Structural BiologyPhilip Bourne
 

Semelhante a Big Data and Analytics Across the Interdisciplinary Divide (20)

2016 08 gxaas
2016 08 gxaas2016 08 gxaas
2016 08 gxaas
 
Data Science Meets Academia - What Comes Next?
Data Science Meets Academia - What Comes Next?Data Science Meets Academia - What Comes Next?
Data Science Meets Academia - What Comes Next?
 
Implications of the Fourth Paradigm
Implications of the Fourth ParadigmImplications of the Fourth Paradigm
Implications of the Fourth Paradigm
 
GODAN presentation at the 42nd APAN meeting
GODAN presentation at the 42nd APAN meetingGODAN presentation at the 42nd APAN meeting
GODAN presentation at the 42nd APAN meeting
 
What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?What Can Happen when Genome Sciences Meets Data Sciences?
What Can Happen when Genome Sciences Meets Data Sciences?
 
What Is It Going To Cost And What Is In It For Me?
What Is It Going To Cost And What Is In It For Me?What Is It Going To Cost And What Is In It For Me?
What Is It Going To Cost And What Is In It For Me?
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...
 
It isnt easy being green, or is it?
It isnt easy being green, or is it?It isnt easy being green, or is it?
It isnt easy being green, or is it?
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
The State of Open Data Report by @figshare
The State of Open Data Report  by @figshareThe State of Open Data Report  by @figshare
The State of Open Data Report by @figshare
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Open Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practicesOpen Data - strategies for research data management & impact of best practices
Open Data - strategies for research data management & impact of best practices
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Are Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science AlignedAre Funders and Academic Institutions Approaches to Data Science Aligned
Are Funders and Academic Institutions Approaches to Data Science Aligned
 
Current Disruptions in Media: Earthquakes or New Openings? Stanford as Catalyst
Current Disruptions in Media: Earthquakes or New Openings? Stanford as CatalystCurrent Disruptions in Media: Earthquakes or New Openings? Stanford as Catalyst
Current Disruptions in Media: Earthquakes or New Openings? Stanford as Catalyst
 
Data Science Meets Structural Biology
Data Science Meets Structural BiologyData Science Meets Structural Biology
Data Science Meets Structural Biology
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 

Mais de Philip Bourne

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationPhilip Bourne
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingPhilip Bourne
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityPhilip Bourne
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?Philip Bourne
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangePhilip Bourne
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug DiscoveryPhilip Bourne
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchPhilip Bourne
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data SciencePhilip Bourne
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewPhilip Bourne
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptxPhilip Bourne
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Philip Bourne
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision EducationPhilip Bourne
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data SciencePhilip Bourne
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Philip Bourne
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Philip Bourne
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance SustainabilityPhilip Bourne
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesPhilip Bourne
 

Mais de Philip Bourne (20)

Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
AI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a ConversationAI in Medical Education A Meta View to Start a Conversation
AI in Medical Education A Meta View to Start a Conversation
 
AI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We GoingAI+ Now and Then How Did We Get Here And Where Are We Going
AI+ Now and Then How Did We Get Here And Where Are We Going
 
Thoughts on Biological Data Sustainability
Thoughts on Biological Data SustainabilityThoughts on Biological Data Sustainability
Thoughts on Biological Data Sustainability
 
What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?What is FAIR Data and Who Needs It?
What is FAIR Data and Who Needs It?
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Data Science Meets Drug Discovery
Data Science Meets Drug DiscoveryData Science Meets Drug Discovery
Data Science Meets Drug Discovery
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
BIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in ResearchBIMS7100-2023. Social Responsibility in Research
BIMS7100-2023. Social Responsibility in Research
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Novo Nordisk 080522.pptx
Novo Nordisk 080522.pptxNovo Nordisk 080522.pptx
Novo Nordisk 080522.pptx
 
Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)Towards a US Open research Commons (ORC)
Towards a US Open research Commons (ORC)
 
COVID and Precision Education
COVID and Precision EducationCOVID and Precision Education
COVID and Precision Education
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?Data Science Meets Open Scholarship – What Comes Next?
Data Science Meets Open Scholarship – What Comes Next?
 
Data to Advance Sustainability
Data to Advance SustainabilityData to Advance Sustainability
Data to Advance Sustainability
 
Frontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular ScalesFrontiers of Computing at the Cellular and Molecular Scales
Frontiers of Computing at the Cellular and Molecular Scales
 

Último

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Último (20)

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Big Data and Analytics Across the Interdisciplinary Divide

  • 1. Big Data & Analytics Across the Interdisciplinary Divide Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 12/17/18 BigDIA 1 @pebourne
  • 2. Perspective • I was not trained as a data scientist or computer scientist - I started as a physical chemist • At this point I can’t give you a deep technical perspective • My examples are taken from biomedicine, but broadly applicable • Deeply engaged in preparing one academic institution for a very different data driven interdisciplinary future 12/17/18 BigDIA 2
  • 3. My motivation The biggest gains for our society are going to come through interdisciplinary research where data and analytics catalyze the collaboration 12/17/18 BigDIA 3
  • 4. Consider a wake up call of sorts 12/17/18 BigDIA 4
  • 5. A wake up call of sorts 12/17/18 BigDIA 5 https://www.sciencemag.org/news/2018/12/google-s-deepmind-aces-protein-folding https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp13-what-just-happened/
  • 6. Data as driver 12/17/18 BigDIA 6 https://www.ebi.ac.uk/uniprot/TrEMBLstats Contents of the Protein Data Bank
  • 7. This is a somewhat predictable outcome.. The real excitement comes from the unexpected … Witness the tale of the trauma surgeon … 12/17/18 BigDIA 7 But there is more…
  • 8. Air pollution-ecosystem feedback: unmanned aerial vehicles and ecosystem models to quantify ozone-forest interactions 12/17/18 BigDIA 8 • Spatial heterogeneity • Novel sampling • Senor data Departments: Environmental Sciences Electrical Engineering
  • 9. A working definition of what we are doing … It is the unexpected re-use of information which is the value added by the web Tim Berners-Lee 12/17/18 BigDIA 9 https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#116a5a2d55cf
  • 10. A working definition of what we are doing … It is the unexpected re-use of information which is the value added by the web and subsequent analysis of that information for societal benefit Tim Berners-Lee / Phil Bourne 12/17/18 BigDIA 10 https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#116a5a2d55cf
  • 11. Of course this was all predicted by smart people .. 12/17/18 BigDIA 11
  • 13. I would suggest that this audience has a responsibility to promote the fourth paradigm which is not a well recognized phenomenon across disciplines … Here is one example of how to do so 12/17/18 BigDIA 13
  • 14. How Will Science Change? 1412/17/18 BigDIA
  • 15. Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication From a presentation to the Advisory Board to the NIH Director Example - Photography 1512/17/18 BigDIA
  • 16. Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV SNP methylation 3D structure Gene expression Proteomics Metabolomics MetabolicSignaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWASPopulation dynamics Microbiota Open, complex, diverse digital data Systems Pharmacology Xie et al. Annu Rev Pharmacol Toxicol. 2017 57:245-262 12/17/18 16 BigDIA
  • 17. How should we think about organizing ourselves in an interdisciplinary way to maximize the opportunities offered by the fourth paradigm? 12/17/18 BigDIA 17
  • 18. The Pillars of Data Science 18 Application Domains 12/17/18 BigDIA
  • 19. Lets briefly focus on those five pillars in the context of one area of biomedical informatics – structural bioinformatics What kinds of interchange should be taking place between this field and data science? 12/17/18 BigDIA 19 Mura et al. 2018 Curr Opin Struct Biol. 52:95-102
  • 20. Data Acquisition • Persistence of raw data not clear • Some level of consistency across instrument manufacturers • Lessons in community/society drive 12/17/18 BigDIA 20 Mura et al. 2018 Curr Opin Struct Biol. 52:95-102
  • 21. Data Integration and Engineering • URI’s no - stooped in tradition • Ontologies – somewhat • Linked data - somewhat 2112/17/18 BigDIA Years of experience to convey
  • 22. Data Analytics 22 –SVM’s –Random forest –Neural nets –Deep learning –?? 12/17/18 BigDIA Opportunity to learn from many domains
  • 23. Visualization & Dissemination • Avoid the curse of the ribbon • Think sonics • Look to video games 2312/17/18 BigDIA
  • 24. Ethics, Law & Policy – Community Driven Data Sharing 12/17/18 BigDIA 24
  • 25. How to implement this at any level? 12/17/18 BigDIA 25
  • 26. Guiding Principles • Be constantly strategic and nimble - think supply chain • Be sustainable - do not over reach • Be interdisciplinary • Be a organization without walls • Be diverse, accessible and open • Be team not individually driven • Strive for quality not quantity in education & research • Be innovative and translational through new forms of engagement with the private sector, government, NGOs, local, state, national and international partners 2612/17/18 BigDIA
  • 27. Guiding Principles • Be constantly strategic and nimble - think supply chain • Be sustainable - do not over reach • Be interdisciplinary • Be a organization without walls • Be diverse, accessible and open • Be team not individually driven • Strive for quality not quantity in education & research • Be innovative and translational through new forms of engagement with the private sector, government, NGOs, local, state, national and international partners 2712/17/18 BigDIA
  • 28. Be Interdisciplinary – Be Without Walls • Satellites – discipline driven - located in another School focusing on the mission of that School where data and analytics play a role, e.g., – SOM – data governance and clinical translation – Education – working on educational analytics • Centers – Focus area driven e.g. – Ethics and justice – Neurodegenerative disorders – Alzheimer's, autism, TBI – Sports analytics 2812/17/18 BigDIA
  • 29. Guiding Principles • Be constantly strategic and nimble - think supply chain • Be sustainable - do not over reach • Be interdisciplinary • Be a organization without walls • Be diverse, accessible and open • Be team not individually driven • Strive for quality not quantity in education & research • Be innovative and translational through new forms of engagement with the private sector, government, NGOs, local, state, national and international partners 2912/17/18 BigDIA
  • 30. Be Diverse, Accessible and Open – Why? • Data science exists largely because of open data • Open knowledge encourages disciplinary and interdisciplinary collaboration • Yet much of the scholarship we produce is not accessible at all and certainly not accessible to socioeconomically disadvantaged groups • Gouging by commercial knowledge providers is making the knowledge produced by others less accessible to us • Research is suffering from a reproducibility crisis addressable through greater access to all aspects of the research lifecycle 3012/17/18 BigDIA
  • 31. Be Diverse, Accessible and Open – Why? Consider Biomedicine • Big Data – Total data from NIH-funded research back in 2016 estimated at 650 PB* – 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB in 2016 • Dark Data – Only 12% of data described in published papers is in recognized archives – 88% is dark data^ • Cost – 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives * In 2012 Library of Congress was 3 PB ^ http://www.ncbi.nlm.nih.gov/pubmed/26207759 12/17/18 BigDIA 31
  • 32. A call for making these data open • Mandates – NIH, NSF, Data Management Plans • Business models can be protected yet everyone benefits • It saves lives …. 12/17/18 BigDIA 32
  • 33. Why a more open process? Use case: Diffuse Intrinsic Pontine Gliomas (DIPG) • Occur 1:100,000 individuals • Peak incidence 6-8 years of age • Median survival 9-12 months • Surgery is not an option • Chemotherapy ineffective and radiotherapy only transitive From Adam Resnick12/17/18 BigDIA 33
  • 34. Timeline of genomic studies in DIPG • Landmark studies identify histone mutations as recurrent driver mutations in DIPG ~2012 • Almost 3 years later, in largely the same datasets, but partially expanded, the same two groups and 2 others identify ACVR1 mutations as a secondary, co-occurring mutation From Adam Resnick 12/17/18 BigDIA 34
  • 35. What do we need to do differently to reveal ACVR1? • ACVR1 is a targetable kinase • Inhibition of ACVR1 inhibited tumor progression in vitro • ~300 DIPG patients a year • ~60 are predicted to have ACVR1 • If large scale data sets were only integrated with TCGA and/or rare disease data in 2012, ACVR1 mutations would have been identified • 60 patients/year X 3 years = 180 children’s lives (who likely succumbed to the disease during that time) could have been impacted if only data were FAIR From Adam Resnick 12/17/18 BigDIA 35
  • 36. Research Data Infrastructure … Both funders and some institutions see the need to move from pipes to platforms to accelerate research… 12/17/18 BigDIA 36 https://blog.lexicata.com/wp-content/uploads/2015/03/platform-model- 750x410.png
  • 37. If platforms are the answer we could ask the question… Will {biomedical} research become more like Airbnb? 12/17/18 BigDIA 37 Vivien Bonazzi Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818
  • 38. I am not crazy, hear me out • Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host) • The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder • It seems to be working: – 60 million users searching 2 million listings in 192 countries – Average of 500,000 stays per night. – Evaluation of US $25bn 12/17/18 BigDIA 38 Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818
  • 39. Platforms will ultimately digitally integrate the scholarly workflow for human and machine analysis Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818 BigDIA 3912/17/18
  • 40. Paper Author Paper Reader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Academia.edu Open Science Framework Synapse F1000 Rio Educator Student Pilot Open Data Lab (ODL) underway BigDIA 4012/17/18
  • 41. The NIH through the Big Data to Knowledge (BD2K) is experimenting with a platform, keeping in mind the need to overcome these impediments Enter The Commons https://en.wikipedia.org/wiki/Ealing_Common #/media/File:Ealing_Common_- _geograph.org.uk_-_17075.jpg12/17/18 BigDIA 41
  • 42. Paper Author Paper Reader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Academia.edu Open Science Framework Synapse F1000 Rio Educator Student Commons – Initial focus is on integrating two layers of the scholarly workflow 12/17/18 BigDIA 42
  • 43. Commons topology Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data DigitalObjectCompliance App store/User Interface PaaS SaaS IaaS https://datascience.nih.gov/commons 12/17/18 BigDIA 43
  • 44. Commons Compliance • Treat products of research – data, methods, papers etc. as digital objects • These digital objects exist in a shared virtual space • Digital object compliance through FAIR principles: – Findable – Accessible (and usable) – Interoperable – Reusable https://commonfund.nih.gov/bd2k/commons 12/17/18 BigDIA 44
  • 45. Why a comparison to Airbnb is not fair • Airbnb was born digital • The exchange of services on Airbnb are simple compared to what is required of a platform to support biomedical research Nevertheless there is much to be learnt 12/17/18 BigDIA 45
  • 46. Impediments to platforms • Current work practices by all stakeholders • Entrenched business models • Size of the undertaking aka resources needed • Trust • Incentives to use the platform http://www.forbes.com/sites/johnhall/2013/04/29/ 10-barriers-to-employee- innovation/#8bdbaa811133 12/17/18 BigDIA 46
  • 47. Even if they are successful, platforms are likely to be domain specific and only address the infrastructure.. What else is needed? 12/17/18 BigDIA 47
  • 48. We need to promote openness • Encourage persistent identifiers e.g., ORCID • Encourage preprints • Encourage Open Access (OA) • Recognize openness in hiring and P&T • Teach open scholarship • Promote institutional openness – repositories, wikimedian in residence • Support institutional open data governance • Support global community efforts…. 12/17/18 BigDIA 48
  • 49. Wikidata – fast growing 12/17/18 BigDIA 49 • Get on board with developments in schema.org, knowledge graphs, etc… as part of the rule rather than the exception • Provide metadata and opinion for data we produce or use
  • 50. Let me summarize: How do we address the interdisciplinary divide? • Promote the fourth paradigm • Work within your institutions to promote data science as an interdisciplinary field • Establish an open and integrated environment for data and analytics • Be patient and do not oversell … 12/17/18 BigDIA 50
  • 51. 12/17/18 BigDIA 51 Haas & Schmidt 2018 http://iswc2018.semanticweb.org/workshops-tutorials/#ekg
  • 52. Acknowledgements 12/17/18 BigDIA 52 The BD2K Team at NIH The 150 folks who have passed through my laboratory https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0

Notas do Editor

  1. Model integration in systems pharmacology. Diverse models need to be integrated across multiple methodologies, multiple heterogeneous data sets, organismal hierarchy, and species (transportability).
  2. Distribution of kinases and the number of covalent small-molecule kinase inhibitors (CSKIs) for every targeted kinase across the human kinome
  3. $1.25bn per year to capture all data. After a significant effort at reduction, intramurally data is spread across > 60 data centers; imagine the extramural situation.
  4. Detailed description of the Commons Framework can be found at : https://datascience.nih.gov/commons
  5. 53