Real-World Data Challenges: Moving Towards Richer Data Ecosystems
1. | 1
Anita de Waard 0000-0002-9034-4119
VP Research Data Collaborations
Elsevier RDM Services
a.dewaard@elsevier.com
Big Data PI Meeting
March 16, 2016
Real-World Data
Challenges:
Moving Towards
Richer Data Ecosystems
4. | 4
Trend # 3: Computers are scientists, too!
“intelligent systems for computer-aided
discovery can complement and integrate
into the insight generation loop in
scalable ways…”
http://ieeexplore.ieee.org/abstract/document/7515118/: Computer-Aided Discovery: Toward Scientific Insight Generation with Machine Support
“This work combines time series Principal
Component Analysis with InSAR to constrain
the space of possible model explanations on
current empirical data sets and achieve a better
identification of deformation patterns”
5. | 5
Raising many technical/organisational/policy questions:
• Is Long-Tail Data + Semantics = Big Data?
• Is Data Science a field, or a skill? (A department, or a class?)
• Are supercomputing centers research departments or bits of infrastructure? (And if
infrastructure, are they part of IT? (“Oh, no, anything but that!”)
• Are repositories places to store outputs, or places where science is conducted?
• If so, how are repositories and HPC’s recognised and rewarded?
• How can we keep track of (micro)provenance of parts of data sets?
• Should we explore Blockchain technology for this? (“Oh no, anything but that!”)
• Is a piece of software part of the University’s Research Outputs?
• If so, how do we reward brilliant coders who blog, but don’t write?
• How do we reward (virtual) collaboration?
• Why won’t those damn scientists share their data?
• Who will own the Data Science Cloud: Amazon? Or the joint HPC’s (NDS??) Is NIH
Data Commons the Model? Or is this a free for all? What is the role of commercial
parties?
• Is data curation/stewardship a part of science, or a glorified administrator's job?
• What is the role of libraries, in all this?
• And why the hell is a publisher talking about it?
6. | 6 6
Inst. Data
Repositorie(s)
Lab
ELN(s)
Data
Journal
Data search
Link to article
Journal
Find
Topic
Identify
gaps
Plan &
Fund
Discover data, people,
methods & protocols
Collect, analyze &
vizualize
Store, preserve
& share
Publish
Prepare, reproduce,
re-use & benchmark
Domain-specific
Repositories
General search
Faculty
LIMS
Data
center
Inst. Data
Repositorie(s)
Lab
ELN(s)
Data
Journal
Data search
Data Management
Plans
Metadata, methods &
protocols ready for
preservation and publishing
Link to article
Journal
Publish data
(under embargo)
Secure
discoverability
in & outside
the institution
Plan each step from
experiment to publish
Domain-specific
Repositories
General search
What Elsevier is Interested in: Supporting RDM Networks
7. | 7
Biological Pathways extracted via
semantic text mining
A upregulates B
B upregulates C
C increases disease D
Normalizing vocabularies required: proteins, diseases, drugs, chemicals
A B C D
Bioactivities
through text analysis
IC50 6.3nM, kinase binding assay
10mM concentration
Chemical Structures
And Properties
InChi,
Name
NCBI,
Uniprot
EMTREE
ReaxysTree,
Structures
What Elsevier is Interested in: Knowledge Graphs in Life
Science