Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Martone acs presentation
1. Surveying the Biomedical Resource
Landscape
Maryann E. Martone, Ph. D.
Professor Emeritus
University of California, San Diego
and
Director of Biosciences
Hypothesis
3. Database
Software Application
Data Analysis Service
Topical Portal
Core Facility
Ontology
Software Resource
Years:
NIF is an initiative of the
NIH Blueprint
consortium of institutes
– NIF has been tracking
and cataloging the
biomedical resource
landscape since 2008
4. NIF: A New Type of Entity for New Modes
of Scientific Dissemination
• NIF’s mission is to maximize the awareness of, access to and utility of
digital resources produced worldwide to enable better science and
promote efficient use
– NIF was one of the first attempts to unite neuroscience information without
respect to domain, funding agency, institute or community
– Confront the scale, dynamism of domain and fluidity of technology
– Thought about global search across independently maintained
resources
– NIF is a library for scholarly output that is a web enabled resource and not a paper;
a Pub Med and Pub Med Central for things that aren’t articles
– Aggregates and tracks all the different databases, tools and resources now
produced by the scientific community
– Makes them searchable from a single interface
– Educate neuroscientists and students about effective data sharing
http://neuinfo.org
5. Organizing framework
and portal for data
dashed lines: mapping of
metadata, standards,
links to aggregators,
datasets
aggregators: repositories
or various indices whose
metadata are or can be
mapped into Commons
metadata
Data
Digital objects
A data discovery index for
biomedicineThereisworkforeveryone(andmore)
datamed.biocaddie.org (v0.5) alpha testing
6. Registry vs Data index: Metadata about
resource vs metadata/data in database
With the thousands of databases and other information sources
available, simple descriptive metadata will not suffice
Each source is
categorized
and presents
custom
facets;
integrated
views
7. SciCrunch: A “social network” for resources
• NIF is a general search engine
across neuroscience and
biomedicine
• Many communities want to
create more focused portals
• Own brand
• Own view
• How do we create a system that
satisfies community needs
without creating another silo?
• SciCrunch: Configurable portals
on top of shared resource pools
9. Semantic Information Framework
• Aggregate of community ontologies with some extensions for neuroscience
• Available as services through SciCrunch and BioPortal —> SciGraph Neo4J-based
Organism
Molecule InvestigationSubcellular
structure
Cell
Dysfunction
Quality
Anatomical
Structure
SciCrunch uses ontologies to enhance search and discovery but is not constrained by them
NS Function
NIFSTD
11. Domain
Knowledge
• Ontologies
• Atlases/Ma
ps
Claims,
assertions
• Registries
• Annotation
• Models and
simulations
• Analyses
Data
• Databases
• Data sets
• Derived
data
Literature
Search and Discovery
Cannot try to shoe-horn everything into a single representation or system, but figure
out how information (data + knowledge) can flow between them; Knowledge is fluid
and will continually update
Creating a Data and Resource Discovery
Environment
12. ORCID
RRID
Data
Digital world runs on globally unique and persistent identifiers; PID’s serve as a
“key” for identifying the same entity across different contexts
e-Science Ecosystem
Metadatastandards
People
Research resources
Ontology
Concepts
DOI
Protocols
Minimal Information Models
TranslationNon-digital
Repositories
and
Registries
CDE
No resource provider is an island: ensure your objects are FAIR
PID
Repositories,
Registries,
Aggregators, Social
platforms, Workflow
platforms
Searchanddiscovery
Citationstandards
articles
software
Digital
14. Making research objects FAIR
– You (and the machine) have to be able to
find it
• Accessible through the web
• Annotations
• Stable links and unique identifiers
– You have to be able to use it
• Data type specified and in a usable form
– You have to know what the data mean
• Some semantics
• Context: Experimental metadata
–You have to be able to cite it:
• Provenance: Where did the data come from?
Make your data FAIR: Findable, Accessible, Interoperable, Reusable
https://www.force11.org/group/fairgroup
X
Research Resource
15. Resource Identification Initiative: Linking
resources to literature
• Have authors supply appropriate
identifiers for key resources used
within a study such that they are:
– Machine processible (i.e., unique
identifier that resolves to a single
resource)
– Outside of the paywall
– Uniform across journals and publishers
• Pilot project: SciCrunch portal
serving identifiers for
– Software/databases (NIF RR)
– Antibodies (NIF AB Registry)
– Genetically modified organisms (NIF
aggregation)
Absolutely reliant on comprehensive registries to enforce uniqueness, persistence and
consistent metadata
16. What studies used...
Type RRID into
Google Scholar;
return a list of
papers that use
that resource
>700 papers
>90 journals
1000’s of RRID’s
17. Resource IDs from NIF aggregated databases
•A single portal for
authors
•>15 authoritative
databases
•One search interface
•Not just my research
resource
•Thinking globally
about infrastructure
RII Portal
http://scicrunch.org/resources
Utilized NIF/SciCrunch infrastructure-NIH
Blueprint; NIDDK
18. Linking data to Literature: Joint Declaration of
Data Citation Principles
• Synthesis of data
citation principles
– >25 groups
participating
• Designed to be high
level and easy to
understand
• Supplemented with a
glossary, references
and examples
http://www.force11.org/datacitation
1. Importance
2. Credit and attribution
3. Evidence
4. Unique Identification
5. Access
6. Persistence
7. Specificity and
verifiability
8. Interoperability and
flexibility
20. hypothes.is: Web annotation
• Works as an
independent layer
over any web page or
PDF *(images, video
and data coming)
• Open source
• Standards based
• Easy-to-use
https://hypothes.is/annotating-all-knowledge/
21. Neuroscientist annotating her own paper to provide updates and additional
information
An interactive knowledge layer
22. Conclusions
• Investments in infrastructures-successful and unsuccessful-
have laid the foundations for a functioning ecosystem
• Comprehensive registries, repositories and aggregators play
a key role in providing stable and useful representations of
key digital entities
• Persistence is a social contract
• Population is key
• i.e., people and organizations are in the mix!
• Need to think globally across the workflow
• FORCE11 coordinating, collating and organizing principles
that govern flow of research objects within the ecosystem
• New technologies are constantly arising
Figure X: Resource types and year added to the registry. Research resources are each tagged with one or more resource types, the most common are represented in this graph (for all data see http://neurolex.org/wiki/Resource_Type_Hierarchy). The year that a resource was added to the registry is denoted by the color, note that 2009 and earlier data are lumped into 2010.
The core of the DDI (Data Discovery Index) is indexing of data (digital objects) drawn from data repositories. These consist of individual repositories (e.g. PDB) but sources of data can also include aggregators (e.g. OmicsDI, LINCS) that draw from multiple repositories. bioCADDIE is not to replace existing indices. It is for community to work together to maximize the discoverability on top of the existing indices. BioCADDIE indexes the data and provides a search engine that users can access through a User Interface.