(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
Research Objects for FAIRer Science
1. Research Objects for
FAIRer Science
Professor Carole Goble CBE FREng FBCS
The University of Manchester, UK
carole.goble@manchester.ac.uk
VIVO/SciTS Conferences 6-8 August 2014, Austin,TX
2. Scientific publications have at least
two goals:
(i) to announce a result and
(ii) to convince readers that the
result is correct
…..
papers in experimental science
should describe the results and
provide a clear enough protocol to
allow successful repetition and
extension
Jill Mesirov
Accessible Reproducible Research
Science 22Jan 2010: 327(5964): 415-416
DOI: 10.1126/science.1179653
VirtualWitnessing*
*Leviathan and the Air-Pump: Hobbes, Boyle, and the
Experimental Life (1985) Shapin and Schaffer.
3. VirtualWitnessing*
*Leviathan and the Air-Pump: Hobbes, Boyle, and the
Experimental Life (1985) Shapin and Schaffer.
Capturing, representing,
sharing the information
needed to understand how a
research result came about.
Context of results
• Inputs, outputs, process…
Context of resources
• Instruments, data, software,
people…
4. “An article about computational
science in a scientific publication
is not the scholarship itself, it is
merely advertising of the
scholarship. The actual
scholarship is the complete
software development
environment, [the complete
data] and the complete set of
instructions which generated the
figures.”
David Donoho, “Wavelab and Reproducible
Research,” 1995
datasets
data collections
standard operating
procedures
software
algorithms
configurations
tools and apps
codes
workflows
scripts
code libraries
services,
system software
infrastructure,
compilers
hardware
Morin et al Shining Light into Black Boxes
Science 13 April 2012: 336(6078) 159-160
Ince et alThe case for open computer programs
Nature 482, 2012
5. “I can’t immediately reproduce the research in
my own laboratory. It took an estimated 280
hours for an average user to approximately
reproduce the paper.”
Phil Bourne
NIH BigWig for Data Science
6. a reproducibility paradox
big, fast,
complicated,
multi-step,
multi-type
multi-field
greater
expectations
of
reproducibility
diy publishing
greater access
14. • Collaboration –
Complementarity correlation
• Modellers share more than
Experimentalists
• Experimentalists reuse models
more than Modellers
• Active enclave sharing
• Public sharing tricky even after
publication, bribery and threats
• Data Hugging, Flirting and
Voyerism
15. • Playground rules apply
• Fluid, transient collaborations >
membership mgt pain in a*se
• Shameless exploitation of PI
competitiveness & vanity
• PI & Funder leadership
• Pan project spawned
collaborations –YES!!!!
• But not necessarily visible to us.
16. Data discovery
Data assembly,
cleaning, and
refinement
Ecological Niche
Modeling
Statistical analysis
Data collection
Insights Scholarly Communication
& Reporting
Enclosed sea problem
(Ready et al., 2010)
Pilumnus hirtellus
Scientific
Workflows
17. BioSTIF
method
instruments and laboratory
materials
Data discovery
Data assembly,
cleaning, and
refinement
Ecological Niche
Modeling
Statistical analysis
Data collection
Insights Scholarly Communication
& Reporting
Method Matters!
19. "Mapping present and future predicted distribution patterns for a meso-grazer
guild in the Baltic Sea" by Sonja Leidenberger et al
20. 1st International Workshop on Social Object Networks (SocialObjects 2011), Boston, October 9th 2011.
Find, Click ‘n’ Go
File ‘n’ Forget
SpecialistCurators
21. 24
Properties What would you ask a publication if you could?
Identity and Description
Uniqueness
Authenticity
Who are you ?
Where and when were you born ?
Who were your parents (creators) ?
Review, Reuse, and Repurpose For which purpose were you conceived and have been used ?
Inspection
Visualization
Annotations
What do you have inside ?
Representation How is your content structured ?
Access Rights May I access all your parts ?
Adaptability Which parts can I replace ?
Evolution & Versioning
Provenance
What have they done to you ?
Who and When ?
Why did they do that ?
Quality Why are you relevant to me ?
Can I believe what you are saying or trust your results ?
Reproducibility Do you still produce the same results ?
Fitness Are you still working ?
How could I repair you ?
Credit and attribution How could I thank you ?
How could I talk about you ?
25. Howard Ratner, STM Innovations Seminar 2012
was: Chair STM Future Labs Committee, CEO EVP Nature PublishingGroup,
now: Director of Development for CHORUS (Clearinghouse for the Open Research of US)
http://www.youtube.com/watch?v=p-W4iLjLTrQ&list=PLC44A300051D052E5
http://www.myexperiment.org/packs/196.html
26.
27.
28. What The Commons* Is and Is Not
Is Not:
– A database
– Confined to one physical
location
– A new large
infrastructure
– Owned by any one group
Is:
– A conceptual framework
– Analogous to the Internet
– A collaboratory
– A few shared rules
• All research objects
have unique
identifiers
• All research objects
have limited
provenance
Philip E. Bourne Ph.D.
Associate Director for Data Science, National Institutes of Health
http://www.slideshare.net/pebourne
*The NIH BD2K Commons Framework $100million in 2015
30. http://www.researchobject.org/
A Framework to Bundle and Relate multi-hosted
(digital) resources of a scientific experiment or
investigation using standard mechanisms & uniform
access protocols. Carriers of Research Context
Outputs are first class
citizens to be managed,
credited and tracked:
data, software
Research Objects
31. Links
• Recording & linking
together the
components of an
experiment
• Linking across
experiments.
34. repeat replicate
DrummondC Replicability is not Reproducibility: Nor is it Good Science, online
Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
Methods
(techniques, algorithms,
spec. of the steps)
Materials
(datasets, parameters,
algorithm seeds)
Experiment
Instruments
(codes, services, scripts,
underlying libraries)
Laboratory
(sw and hw infrastructure,
systems software,
integrative platforms)
Setup
reusereproduce
Executable Research Object
35. same experiment
same set up
same lab
same experiment
same set up
different lab
same experiment
different set up
different experiment
some of same
Validate
reusereproduce
repeat replicate
http://www.biomedcentral.com/biome/carole-goble-on-reproducible-
research-what-it-really-means-how-to-reach-it/
36. Design
Execution
Result Analysis
Collection
Publish /
Report
Peer
Review
Peer
Reuse
Modelling
Can I repeat &
defend my
method?
Can I review / reproduce
and compare my results /
method with your results /
method?
Can I review /
replicate and certify
your method?
Can I transfer your
results into my
research and reuse
this method?
* Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)
Research Report
Prediction
Monitoring
Cleaning
37. specialist codes
libraries, platforms, tools
services
(cloud)
hosted
services
commodity
platforms
data collections
catalogues software
repositories
my data
my process
my codes
integrative
frameworks
gateways
49. Identity
Annotation
Aggregation
FAIR RO Core Model
DOIs
URIs
Handles
ORCID
Aggregations
Resource maps
Proxies
Annotation first
class and stand-off
Identity persistence
and resolution
Citation
W3C
OAM
OAI-
ORE
54. • RO Management
– Transportation / Access / Citation
– Id location of RO “container”
– Provenance of RO & contents
– Behaviour/lifecycle of RO & contents
– Policies
• RO Interpretation
– What the RO and its content mean
– How they can be compared and validated
– How they can be used, executed, linked
• Interpretation variations
– Type (e.g.Workflows)
– Discipline (e.g. Biology)
– Task (e.g. Discovery, Execution)
– Activity (e.g. Experiment)
Progression Levels
Management and Interpretation for Integrated Applications
55. Progression Levels
Management and Interpretation for Integrated Applications
• RO Management
– Transportation / Access / Citation
– Id location of RO “container”
– Provenance of RO & contents
– Behaviour/lifecycle of RO & contents
– Policies
• RO Interpretation
– What the RO and its content mean
– How they can be compared and validated
– How they can be used, executed, linked
• Interpretation variations
– Type (e.g.Workflows)
– Discipline (e.g. Biology)
– Task (e.g. Discovery, Execution)
– Activity (e.g. Experiment)
57. Checklists
Versioning
Provenance
Dependencies
NISO-JATS
EXPO, ISA
JERM, OBI
MIAME, SBML
GIT
MIM Ontology
PROV
PAV
VoID
Puppet Docker
Make
PAV
RO Model roevowfprov
wfdesc
SysBio Workflows
DCAT
Annotation
Profiles
.
Depth: how deeply
described
Coverage: how
much is covered.
Progression levels
Semantic FrameworkExperiment
VIVO-ISF
DC
58. Checklists
aka Minimum Information Models
Safety, quality, consistency
Validation, monitoring
Common in experimental
science
Checklists defined in terms of
the RO model and its
annotations
Services execute against
model and an RO’s
annotations Zhao et. al. A Checklist-BasedApproach for QualityAssessment
of Scientific Information 3rd In.Workshop on LinkedScience, 2013
Minim Checklist Ontology to
describe checklists
Must, Should…
Cardinalities…
Rules…
http://purl.org/net/mim/ns
59. Towards Smart IntegratedApplications & Mediation
1. Id & Cite fluid things
2. First class citizenship &
uniform handling of artifacts
3. Compound
4. Mixed, leaky Containers
5. Span outcomes, evolve
outputs, emergence
6. Layered interpretation and
management profiles using
standards
7. Machine-processable
8. Technology Independent
Bechhofer,Why linked data is not enough for scientists,
DOI: 10.1016/j.future.2011.08.004
60. Towards Smart IntegratedApplications & Mediation
Bechhofer,Why linked data is not enough for scientists,
DOI: 10.1016/j.future.2011.08.004
1. Id & Cite fluid things
2. First class citizenship &
uniform handling of artifacts
3. Compound
4. Mixed, leaky Containers
5. Span outcomes, evolve
outputs, emergence
6. Layered interpretation and
management profiles using
standards
7. Machine-processable
8. Technology Independent
61. Research Objects Framework
a systematic approach to representing
a different unit of scholarship
“development” view“logical” view
“process” view “physical” view
SERVICESPOLICIES
LIFECYCLESMETADATA
PROFILES
63. ments as the access and live repositories, it could be implemented with slower (or offline) stora
tives.
Open Archival Information System Pilot
ROs are “Information Packages”
ROManager
RODL
64. • A single, transferable object
encapsulates description and
resources
– Download, transfer, publish
• ZIP-based format + manifest
describes aggregation and
annotations
– Unpack with standard tooling
• JSON-LD for manifest
– Lightweight linked-data format
– Use JSON tooling and services
Baking with off the
shelf platforms
OMEX archive
bundle
Adobe
UCF
OREPROVODF
65. • Work with local folder
structure.
– Version: github.
– Metadata: Local tooling
– Metadata about aggregation
and its resources: “hidden
folder”
• Zenodo/figshare pull
snapshot from github
– DOIs for aggregation
– new DOIs: release cycles
Baking with off the
shelf platforms
http://dx.doi.org/10.6084/m9.figshare.1031591
66. FARSITE
coded descriptions of
clinical study cohorts
an NHS tool to assess the
feasibility of gathering a cohort
packages codes,
study, and metadata
Home
Baking
68. integrated database and journal
http://www.gigasciencejournal.com
galaxy.cbiit.cuhk.edu.hk
[Peter Li]
69. Nanopub: represents structured
data along with its provenance in a
single publishable and citable entry
Galaxy workflows: re-enact the analysis
Research Object:
aggregates the
(digital) resources
contributing to
findings of
(computational)
research (results,
data and software)
as citable
compound digital
objects
http://isa-tools.github.io/soapdenovo2/
http://sandbox.wf4ever-project.org/portal/ro?ro=http://sandbox.wf4ever-project.org/rodl/ROs/SOAP2denovo2-Aureus/
[Alejandra Gonzalez-Beltran
Philippe Rocca-Serra]
70. what’s the least we can do?
how might ROs minted and used by science teams?
how might ROs be implemented and used by developer teams?
Standards
Models
Platforms
Id Schemes
Resolution
Light touch
Extensible
Infiltration
Mapping
Making,
Curating, Using
Nudging
Sharing
Linking
Infiltration
Embedding into
and changing
work practices
TOOLS
Citing
Technical Social
Reward
Mixed stewardship
Citation
Schemes
Fragility
73. Stealthy not Sneaky
to reduce the friction
instrument the world
Incremental
JIJIT not JIC
Focus on Personal
Productivity
not Public Good
Auto-magical
From made reproducible to born reproducible
What’s the least we can do?
74. KnowledgeTurns
Transportation & Mediation
Unit of Scholarly Currency
Context, Comparison
Distributed: Search, Discover, Index, Harvest, Port
Research Turns
Release model: Evolution, Emergence,
Discourse, Comparison, Historical review
Forks, Merges & Fixivity
Flow across groups, projects and articles
Anti-Salami, Threaded Publications
Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013
Profile Focus
Body of knowledge around methods, workflows,
software, data, person, rather than publication.
First class citation, credit and respect
75. Open Research Practice is (increasingly) like
Open Source Software Practice.
(Which we know a lot about)
76. FAIR research practice benefits from a shared and
principled approach for identification, aggregation
and annotation of research components of all kinds.
– Using existing standards, vocabularies, frameworks,
platforms, infrastructures. Using linked data and
semantic interoperability
VIVO - to represent the
full context of
researchers’ work.
SciTS – to study the
research process and
research collaboration
78. • Barend Mons
• Sean Bechhofer
• Philip Bourne
• Matthew Gamble
• Raul Palma
• Jun Zhao
• AlanWilliams
• Stian Soiland-Reyes
• Paul Groth
• Tim Clark
• Juliana Freire
• Alejandra Gonzalez-Beltran
• Philippe Rocca-Serra
• Ian Cottam
All the members of the Wf4Ever team
iSOCO: Intelligent Software Components S.A.,
Spain
University of Manchester, School of Computer
Science, Manchester, United Kingdom
University of Oxford, Department of Zoology,
Oxford, UK
Poznan Supercomputing and Networking
Center. Poznan, Poland
IAA: Instituto de Astrofísica de Andalucía,
Granada, Spain
Leiden University Medical Centre, Centre for
Human and Clinical Genetics, The Netherlands
Colleagues in Manchester’s Information
Management Group
RO Advisory Board Members
http://www.researchobject.org
http://www.wf4ever-project.org