Call Girls in Lucknow Just Call 👉👉 8875999948 Top Class Call Girl Service Ava...
The real world of ontologies and phenotype representation: perspectives from the Neuroscience Information Framework
1. The real world of ontologies and
phenotype representation:
perspectives from the
Neuroscience Information
Framework
Maryann Martone, Ph. D.
University of California, San Diego
2. “Neural Choreography”
“A grand challenge in neuroscience is to elucidate brain function in relation
to its multiple layers of organization that operate at different spatial and
temporal scales. Central to this effort is tackling “neural choreography” --
the integrated functioning of neurons into brain circuits-- Neural
choreography cannot be understood via a purely reductionist approach.
Rather, it entails the convergent use of analytical and synthetic tools to
gather, analyze and mine information from each level of analysis, and
capture the emergence of new layers of function (or dysfunction) as we
move from studying genes and proteins, to cells, circuits, thought, and
behavior....
However, the neuroscience community is not yet fully engaged in exploiting the
rich array of data currently available, nor is it adequately poised to capitalize
on the forthcoming data explosion. “
Akil et al., Science, Feb 11, 2011
3. “Data choreography”
In that same issue of Science
Asked peer reviewers from last year about the availability and use of
data
About half of those polled store their data only in their
laboratories—not an ideal long-term solution.
Many bemoaned the lack of common metadata and archives as a
main impediment to using and storing data, and most of the
respondents have no funding to support archiving
And even where accessible, much data in many fields is too poorly
organized to enable it to be efficiently used.
“...it is a growing challenge to ensure that data produced during the
course of reported research are appropriately described, standardized,
archived, and available to all.” Lead Science editorial (Science 11
February 2011: Vol. 331 no. 6018 p. 649 )
4. NIF is an initiative of the NIH Blueprint consortium of institutes
What types of resources (data, tools, materials, services) are
available to the neuroscience community?
How many are there?
What domains do they cover? What domains do they not cover?
Where are they?
Web sites • PDF files
Databases • Desk drawers
Literature
Supplementary material
Who uses them?
Who creates them?
How can we find them?
How can we make them better in the future? http://neuinfo.org
5. In an ideal world...
We’d like to be able to find:
What is known****:
What is the average diameter of a Purkinje neuron
Is GRM1 expressed In cerebral cortex?
What are the projections of hippocampus?
What genes have been found to be upregulated in
chronic drug abuse in adults
Is alpha synuclein in the striatum?
What studies used my polyclonal antibody against
Required Components:
GABA in humans? – Query interface
What rat strains have been used most extensively in – Search strategies
research during the last 20 years? – Data sources
– Infrastructure
– Results display
What is not known: – Why did I get this
result?
Connections among data – Analysis tools
Gaps in knowledge
Without some sort of framework, very difficult to
6. The Neuroscience Information Framework: Discovery and
utilization of web-based resources for neuroscience
Literature
UCSD, Yale, Cal Tech, George Mason, Washington Univ
Database
Federation
A portal for finding and
using neuroscience
resources
A consistent framework for
describing resources
Provides simultaneous
search of multiple types of
information, organized by
category
Supported by an expansive
ontology for neuroscience
Utilizes advanced
technologies to search the
“hidden web”
Registry
Supported by NIH Blueprint http://neuinfo.org
7. We need more databases !?
•NIF Registry: A
catalog of
neuroscience-relevant
resources
•> 5000 currently
listed
•> 2000 databases
•And we are finding
more every day
8. NIF must work with ecosystem as
it is today
NIF was one of the first projects to attempt data integration in
the neurosciences on a large scale
NIF is supported by a contract that specified the number of
resources to be added per year
Designed to be populated rapidly; set up process for progressive refinement
No budget was allocated to retrofit existing resources; had to work with
them in their current state
We designed a system that required little to no cooperation or work from
providers
NIF was required to assemble (not create) ontologies very fast and to provide a
platform through which the community could view, comment and add
NIF is enriched by ontologies but does not depend on them
Took advantage of community ontologies
But needed to take a very pragmatic and aggressive approach to incorporating and using them
Neurolex semantic wiki
9. What are the connections of the
hippocampus?
Hippocampus OR “CornuAmmonis” OR
“Ammon’s horn” Query expansion: Synonyms
and related concepts
Boolean queries
Data sources
categorized by
“data type” and
level of nervous
system Tutorials for using
full resource when
getting there from
NIF
Common views
across multiple
sources
Link back to
record in
original
source
10. Imminent: NIF 5.0
NIF 5.0 about
to be released
New design
New query
features
New analytics
11. What do you mean by data?
Databases come in many shapes and sizes
Primary data: Registries:
Data available for reanalysis, e.g., Metadata
microarray data sets from GEO; Pointers to data sets or
brain images from XNAT; materials stored elsewhere
microscopic images (CCDB/CIL)
Data aggregators
Secondary data
Aggregate data of the same
Data features extracted through
data processing and sometimes
type from multiple sources,
normalization, e.g, brain structure
e.g., Cell Image Library
volumes (IBVD), gene expression
,SUMSdb, Brede
levels (Allen Brain Atlas); brain Single source
connectivity statements (BAMS) Data acquired within a single
Tertiary data context , e.g., Allen Brain Atlas
Claims and assertions about the
meaning of data Researchers are producing a variety of
E.g., gene information resources using a multitude of
upregulation/downregulation, technologies
brain activation as a function
12. Exploration: Where is alpha synuclein?
•Spatially:
•Gene
•Protein
•Subcellular
•Cellular
•Regional
•Organism
•Semantically:
•Gene regulation networks
•Protein pathways
•Cellular local connectivity
•Regional connectivity
•Who is studying it?
•Who is funding its study?
Networks exist across scales; all important in the nervous system
13. NIFSTD Ontologies
Set of modular ontologies
86, 000 + distinct concepts +
synonyms
Bridge files between modules
Expressed in OWL-DL language
Currently supports OWL 2
Tries to follow OBO community
best practices
Standardized to the same
upper level ontologies
e.g., Basic Formal Ontology
(BFO), OBO Relations
Ontology (OBO-RO),
Imports existing community
ontologies Covers major domains of neuroscience:
e.g., CHEBI, GO, PRO, Organisms, Brain Regions, Cells,
DOID, OBI etc. Molecules, Subcellular parts, Diseases,
Retains identifiers in Nervous system functions, Techniques
most recent additions
but reflects history
Fahim Imam, William Bug
13
14. “Search computing”: Query by concept
What genes are upregulated by drugs of abuse in the
adult mouse? (show me the data!)
Morphine
Increased
expression
Adult Mouse
Reasonable standards make it easy to search for and compare results
15. New: Data analytics
Diseases of nervous system
Neoplastic disease of nervous system
NIF data federated sources
Neurodegenerative
Seizure disorders
NIH
Reporter
NIF is in a unique position to answer questions about the neuroscience
ecosystem using new analytics tools
16. Results are organized within a common
framework
Target site
Synapsed by
innervates Connects to
Input region
Synapsed with
Cellular contact
Projects to
Axon innervates
Subcellular contact
Source site
Each resource implements a different, though related model;
systems are complex and difficult to learn, in many cases
18. The scourge of neuroanatomical nomenclature:
Importance of NIF semantic framework
•NIF Connectivity: 7 databases containing connectivity primary data or claims
from literature on connectivity between brain regions
•Brain Architecture Management System (rodent)
•Temporal lobe.com (rodent)
•Connectome Wiki (human)
•Brain Maps (various)
•CoCoMac (primate cortex)
•UCLA Multimodal database (Human fMRI)
•Avian Brain Connectivity Database (Bird)
•Total: 1800 unique brain terms (excluding Avian)
•Number of exact terms used in > 1 database: 42
•Number of synonym matches: 99
•Number of 1st order partonomy matches: 385
19. Why so many names?
The brain is perhaps unique among major organ systems in the
multiplicity of naming schemes for its major and minor regions.
The brain has been divided based on topology of major features,
cyto- and myelo-architecture, developmental boundaries,
supposed evolutionary origins, histochemistry, gene expression
and functional criteria.
The gross anatomy of the brain reflects the underlying networks
only superficially, and thus any parcellation reflects a somewhat
arbitrary division based on one or more of these criteria.
The “activation map” images that commonly accompany brain imaging papers can be
misleading to inexperienced readers, by seeming to suggest that the boundaries between
“activated” and “unactivated” patches of cortex are unambigous and sharp. Instead, as
most researchers are aware, the apparent sharp boundaries are subject to the choice of
threshold applied to the statistical tests that generate the image. What, then, justifies
dividing the cortex into regions with boundaries based on this fuzzy, mutable measure of
functional profile?
(Saxe et al., 2010, p. 39).
Brainmaps.org
20. Program on Ontologies for Neural
Structures
International Neuroinformatics Coordinating Committee
Structural Lexicon Task Force
Defining brain structures
Translate among terminologies
Neuronal Registry Task Force
Consistent naming scheme for neurons
Knowledge base of neuron properties
Representation and Deployment Task Force
Formal representation
Also interacts with Digital Atlasing Task Force
http://incf.org
21. •Provide a simple framework
for defining the concepts
required
NeuroLex Wiki
•Light weight semantics
•Good teaching tool for
learning about
semantic integration
and the benefits of a
consistent semantic
framework
•Community based:
•Anyone can contribute
their terms, concepts,
things
•Anyone can edit
•Anyone can link
•Accessible: searched by
Google
•Building an extensive cross- Demo D03
disciplinary knowledge base
for neuroscience http://neurolex.org Stephen Larson
22. Defining nervous system structures
Parcellation scheme: Set of parcels
occupying part or all of an anatomical
entity that has been delineated using a
common approach or set of criteria,
often in a single study. A parcellation
scheme for any given individual entity
may include gaps, transitional zones, or
regions of uncertainty. A parcellation
scheme derived from a set of individuals
registered to a common target (atlas)
may be probabilistic and include overlap
of parcels in regions that reflect
individual variability or imperfections in
alignment.
Documentation available
INCF task force on
14 parcellation schemes currently represented in Neurolex ontologies
23. Basic model: do not conflate conceptual
structures with parcels
overlaps
Regional part of Parcel
nervous system
overlaps overlaps
Functional part of
nervous system
Parcel Parcel
Neuroscientists have a lot of different parcellation schemes because they have a lot of different
ways of classifying brain structures and techniques to match them are imperfect
24. Linking semantics to space: INCF Atlasing
www.neurolex.org
Waxholm space
Link to spatial
representation in
scalable brain
atlas
Seth Ruffins, Alan Ruttenberg, Rembrandt Bakker
25. Neurons in Neurolex
International
Neuroinformatics
Coordinating Facility (INCF)
building a knowledge base of
neurons and their properties
via the Neurolex Wiki
Led by Dr. Gordon Shepherd
Consistent and parseable
naming scheme
Knowledge is readily
accessible, editable and
computable
While structure is imposed,
don’t worry too much about
the upper level classes of the
ontology
Stephen Larson
26. A KNOWLEDGE BASE OF NEURONAL PROPERTIES
26
Additional semantics added in NIFSTD by ontology engineer
27. Concept-based search: search by meaning
Search Google: GABAergic neuron
Search NIF: GABAergic neuron
NIF automatically searches for types of
GABAergic neurons
Types of GABAergic
neurons
28. Challenges of multiscale neurodegenerative
disease phenotypes
Midbrain degenerated
Substantianigra decreased
not in volume
Substantianigra pars
not compacta atrophied
Loss of Snpcdopaminergic
neurons
Degeneration of nigrostriatal
terminals
•Neurodegenerative diseases target very specific cell Tyrosine-hydroxylase containing
populations neurons degenerate
•Model systems only replicate a subset of features of the
disease
•Related phenotypes occur across anatomical scales
•Different vocabularies are used by different communities
29. Approach: Use ontologies to provide necessary
knowledge for matching related phenotypes
Entities
Midbrain
Neuron (CL)
Has part
Substantianigr Is a
a Substantianigra pars
compacta dopamine
Has part Has part
cell
Substantianigra pars
compacta Neuron cell Has part
Dopamine
Is part soma
of
Is a Is a
Part of neuron Small molecule
Qualities (GO) (Chebi)
Degenerate
Atrophied
Decreased in magnitude
Decreased Is a
relative to some normal
volume Sarah Maynard, Chris Mungall,
Fewer in
NIFSTD/PKB Suzie Lewis, Fahim Imam
number OBO ontology
30. EQ Representation of Phenotypes in Neurodegenerative
Disease: PATO and NIFSTD
inheres in
Human has part Neocortex pyramidal
(birnlex_516) neuron
Instance: Human with
Alzheimer’s disease 050 inheres in inheres in
Alzheimer’s Increased Phenotype
disease number of birnlex_2087_56
towards
Lipofuscin
about
Structured annotation
model implemented in WIB
Chris Mungall, Suzanna Lewis
31. OBD: Ontology based database
Provides a user
interface for matching
organisms based on
similarity of
phenotypes
Based on EQ model
Uses knowledge in the
ontology to compute
similarity scores and
other statistical
measures like
information content
Chris Mungall, Suzanna Lewis, Lawrence Berkeley
http://www.berkeleybop.org/pkb/ Labs
32. Computes common subsumers and information
content among phenotypes
Thalamus
Midline nuclear Paracentral
group nucleus
Cellular Cellular
Lewy Body inclusion inclusion
33. PhenoSim: What organism is most similar to a human
with Huntington’s disease?
Part of basal ganglia
decreased in
magnitude
Globuspallidusneuropil
Putamen atrophied
degenerate
Neuron in striatum
decreased in
magnitude
Fewer neostriatum
medium spiny neurons in Neurons in striatum
putamen degenerate
Nervous system cell
change in number in
striatum
Increased number of
astrocytes in caudate Neurons in striatum
(HDexon1)62) that express exon1 of the human mutant degenerate et al., J
*B6CBA-TgNnucleus HD gene- Li
Neurosci, 21(21):8473-8481
34. Progressive enrichment
Understanding and comparing phenotypes will be enriched through community
knowledge bases like Neurolex
Looking forward to continuing this as part of the Monarch project with Melissa
Haendel, Chris Mungall and Suzie Lewis
35. Top Down vs Bottom up
Top-down ontology construction
• A select few authors have write privileges
• Maximizes consistency of terms with each other (automated consistency
NIFSTD
checking)
• Making changes requires approval and re-publishing
• Works best when domain to be organized has: small corpus, formal categories,
stable entities, restricted entities, clear edges.
•Works best with participants who are: expert catalogers, coordinated users, expert
users, people with authoritative source of judgment
Bottom-up ontology construction
• Multiple participants can edit the ontology instantly (many eyes to correct errors)
• Semantics are limited to what is convenient for the domain
• Not a replacement for top-down construction; sometimes necessary to increase flexibility
NEUROLEX • Necessary when domain has: large corpus, no formal categories, no clear edges
•Necessary when participants are: uncoordinated users, amateur users, naïve catalogers
• Neuroscience is a domain that is less formal and neuroscientists are more uncoordinated
Important for Ontologists to define community contribution model
36. It’s a messy ecosystem (and that’s OK)
NIF favors a hybrid, tiered,
federated system Gene
Organism
Neuron Brain part Disease
Domain knowledge
Ontologies
Caudate projects to
Claims about results Snpc Grm1 is upregulated in
chronic cocaine
Betz cells
Virtuoso RDF triples degenerate in ALS
Data
Data federation
Workflows
Narrative
Full text access
37. Musings from the NIF
No one can be stopped from doing what they need to do
Every resource is resource limited: few have enough time,
money, staff or expertise required to do everything they would
like
If the market can support 11 MRI databases, fine
Some consolidation, coordination is warranted though
Big, broad and messy beats small, narrow and neat
Without trying to integrate a lot of data, we will not know what needs to be done
A lot can be done with messy data; neatness helps though
Progressive refinement; addition of complexity through layers
Be flexible and opportunistic
A single optimal technology/container for all types of scientific data and
information does not exist; technology is changing
Think globally; act locally:
No source, not even NIF, is THE source; we are all a source
38. Grabbing the long tail of small
data
Analysis of NIF shows
multiple databases with
similar scope and content
Many contain partially
overlapping data
Data “flows” from one
resource to the next
Data is reinterpreted,
reanalyzed or added to
Is duplication good or bad?
39. Same data: different analysis
Drug Related Gene database:
extracted statements from Chronic vs acute
figures, tables and supplementary morphine in striatum
data from published article
Gemma: Reanalyzed microarray
results from GEO using different
algorithms
Both provide results of increased
or decreased expression as a
function of experimental
paradigm
4 strains of mice
3 conditions: chronic morphine,
acute morphine, saline Mined NIF for all references to GEO
ID’s: found small number where the
same dataset was represented in two
or more databases
http://www.chibi.ubc.ca/Gemma/home.html
40. How easy was it to compare?
Gemma: Gene ID + Gene Symbol
DRG: Gene name + Probe ID
Gemma: Increased expression/decreased expression NIF annotation
standard
DRG: Increased expression/decreased expression
But...Gemma presented results relative to baseline chronic morphine; DRG with
respect to saline, so direction of change is opposite in the 2 databases
Analysis:
1370 statements from Gemma regarding gene expression as a function of
chronicmorphine
617 were consistent with DRG; over half of the claims of the paper were not
confirmed in this analysis
Results for 1 gene were opposite in DRG and Gemma
45 did not have enough information provided in the paper to make a judgment
41. Beware of False Dichotomies
Top-down vs bottom up
Light weight vs heavy weight
“Chaotic Nihilists and Semantic Idealists”
Text mining vs annotation
Curators vs scientists
Human vs machine
DOI’svsURI’s
http://www.datanami.com/datanami/2013-02-
05/chaotic_nihilists_and_semantic_idealists.html
42. NIF team (past and present)
Jeff Grethe, UCSD, Co Investigator, Interim PI Fahim Imam, NIF Ontology Engineer
AmarnathGupta, UCSD, Co Investigator Larry Lui
Anita Bandrowski, NIF Project Leader Andrea Arnaud Stagg
Gordon Shepherd, Yale University Jonathan Cachat
Perry Miller Jennifer Lawrence
Luis Marenco Lee Hornbrook
Rixin Wang Binh Ngo
David Van Essen, Washington University VadimAstakhov
Erin Reid XufeiQian
Paul Sternberg, Cal Tech Chris Condit
ArunRangarajan Mark Ellisman
Hans Michael Muller Stephen Larson
Yuling Li Willie Wong
Giorgio Ascoli, George Mason University Tim Clark, Harvard University
SrideviPolavarum Paolo Ciccarese
Karen Skinner, NIH, Program Officer
Notas do Editor
Doesn’t do it well; doesn’t organize the results in a domain specific way; doesn’t search across itFor use as content goal Dynamic inventory for deep coverage of neuroscience data: Genes -> Systems
What animal models show
NIFSTD and PATO ontologies served as building blocks to build a phenotype model the ontologies provide relationships between neuroscience related terms provide a structure to qualities and allow related qualities to show relationships
Need an interface to explore and ask questions. Cannot view as a graph. Need to be able to ask a question not in SPARQL and get an answer. Need a better interface to put things in. Discuss Neurolex and PKB. Doesn’t have to be perfect interface, but has to allow a domain expert to ask and answer questions..
Indirect matches that match due to hierarchiesNOTE: should make diagram in the style of previous slides (not screenshot)
In validating our results, we see three types of matches.The first are direct matchesNOTE: should make diagram in the style of previous slides (not screenshot)