2. The
Encyclopedia
of Life
A…
Access to data has
changed over the
years
Tim Berner-s Lee: Web of data
Wikipedia defines Linked Data as "a term used
to describe a recommended best practice for
exposing, sharing, and connecting pieces of
data, information, and knowledge on the
SemanticWeb using URIs and RDF.”
http://linkeddata.org/
Genban
k
PDB
3. The mountain of data problem
Would like to be able to find:
What is known****:
What is the average diameter of a Purkinje
neuron
Is GRM1 expressed In cerebral cortex?
What are the projections of hippocampus
What genes have been found to be
upregulated in chronic drug abuse in adults
What studies used my monoclonal mouse
antibody against GAD in humans?
Find all instances of spines that
contain membrane-bound organelles
****by combining data from different
sources and different groups
What is not known:
Connections among data
Gaps in knowledge
Required Components:
– Query interface
– Search strategies
– Data sources
– Infrastructure
– Results display
– Trust
– Context
– Analysis tools
– Tools for translating existing
content into linkable form
– Tools for creating new data ready
to be linked
4. Where would you rather look?
Unstructured vs structured data
Publishing data in the literature/ web pages vs databases and tables
5. Scale
Whole brain data
(20 um
microscopic MRI)
Mosiac LM
images (1 GB+)
Conventional LM
images
Individual cell
morphologies
EM volumes &
reconstructions
Solved molecular
structures
No single technology serves these all
equally well.
Multiple data types; multiple
scales; multiple databases
A multi-scale data problem
A data federation problem
6. Two organizing frameworks for
knowledge
Knowledge in space and spatial relationships
(the “where”)
Knowledge in words, terminologies and
logical relationships (the “what”)
7. Assembling data into coherent
models
Snavely et al. Scene Reconstruction andVisualization from Community Photo
Collections
8. What if...
The Matterhorn could be 15 different things?
There were 6 billion Matterhorns, all more or less different from one
another?
The Roman Coliseum was called by 45 different names?
The photo represented 1/1,000,000 of the whole with no context?
Photos weren’t annotated at all or were tagged “1” or “mm45”?
The statue of liberty was represented as a mathematical equation? Or a
scatter plot?
1
10. Curators vs researchers
• Example of segmented object names from
CCDB for a Node of Ranvier:
• Mitochondria1
• Shwannlowermerge
• U.L.Cisternae
• Crop
• Loop7_lower
• Blue
• Alex
• Lysosomme_3
http://ccdb.ucsd.edu
•Alex left
•Program used to
create
annotations
obsolete
11. The Neuroscience Information Framework: Discovery and
utilization of web-based resources for neuroscience
A portal for finding and
using neuroscience
resources
A consistent framework for
describing resources
Provides simultaneous
search of multiple types of
information, organized by
category
Supported by an expansive
ontology for neuroscience
Utilizes advanced
technologies to search the
“hidden web”
http://neuinfo.org
UCSD,Yale, CalTech, George Mason, Washington Univ
Supported by NIH Blueprint
12. Scale
Whole brain data
(20 um
microscopic MRI)
Mosiac LM
images (1 GB+)
Conventional LM
images
Individual cell
morphologies
EM volumes &
reconstructions
Solved molecular
structures
No single technology serves these all
equally well.
Multiple data types; multiple
scales; multiple databases
A data federation problem
13. How many resources are there?
•NIF Registry: A
catalog of
neuroscience-relevant
resources
•> 3500 currently
described
•> 1500 databases
•Another 4000
awaiting curation
•And we are finding
more every day
14. NIF Data Federation
Too many databases to visit
Capturing content in a few keywords is difficult if not impossible
Each is organized differently; different UI’s, data models and tools
NIF provides tools for databases to register their content to NIF
Access to deep content; currently searches over 35 million records
from > 65 different databases
Web services, schema registration,XML-based description, RDF
Organized according to level of nervous system and data type, e.g.,
brain activation foci
Enhanced keyword query interface
Link to host resource
Accompanied by a tutorial
Defines common data models for similar data
15. HippocampusOR “CornuAmmonis” OR
“Ammon’s horn” Query expansion: Synonyms
and related concepts
Boolean queries
Data sources
categorized by
“data type” and
level of nervous
system
Simplified views of
complex data
sources
Tutorials for using
full resource when
getting there from
NIF
Link back to
record in
original
source
16. NIF data federation...
Simultaneous access to multiple sources of information through a
concept-based interface
Unique resource for asking certain types of questions
e.g., what rat strains have been most commonly used in research
Indexes content in the hidden web not currently well served by search engines
A set of tools for making resources available through the NIF
A platform for data integration
Simplified and neuroscience-centered views of very complicated resources
An ontology for enhanced query and integration
A wealth of real information on the practical issues of search across and
integration of data in the neurosciences
Share experiences through publications, presentations, blogs and with other projects
Developing annotation standards that help with search
Provide best practices for resource creators
17. What are the connections of the
hippocampus?
Connects to
Synapsed with
Synapsed by
Input region
innervates
Axon innervates
Projects toCellular contact
Subcellular contact
Source site
Target site
Each resource implements a different, though related model;
systems are complex and difficult to learn, in many cases
18. Is GRM1 in cerebral cortex?
NIF system allows easy search over multiple sources of information
But, we have difficulty finding data
Well known difficulties in search
Inconsistent and sparse annotation of scientific data
Many different names for the same thing
The same name means many things
“Hidden semantics”: 1 = male; 1 = present; 1=mouse
Allen Brain Atlas
MGD
Gensat
20. Result
•We are not publishing data in a
form that is easy to integrate
•What we mean isn’t clear to a
search engine (or even to a
human)
•We use many different data
structures to say the same
thing
•We don’t provide crucial
information
•Searching and navigating across
individual resources takes an
inordinate amount of human effort
Tempus PecuniaEst Painting by Richard
Harpum
21. NIF: Minimum requirements to use shared
data
You have to be able to find it
Accessible through the web
Structured or semi-structured
Annotations
You have to be able to use it
Data type specified and in a usable form
You have to know what the data mean
Semantics
Identity
1 = integer, time scale, male, left hemisphere
Context: Experimental metadata
Reporting neuroscience data within a consistent framework helps enormously
22. Whole Brain Catalog
Stephen Larson, Mark Ellisman http://wholebraincatalog.org
Uses 3D
game
engine to
bring
together
multiple
data types
within a
common
framewor
k
24. What is an ontology?
Brain
Cerebellum
Purkinje Cell Layer
Purkinje cell
neuron
has a
has a
has a
is a
Ontology: an explicit, formal
representation of concepts and
relationships among them
within a particular domain that
expresses human knowledge in a
machine readable form
Branch of philosophy: a theory
of what is
e.g., Gene ontologies
25. What ontology isn’t
(or shouldn’t be)
A rigid top-down fixed hierarchy for
limiting expression in the
neurosciences
Not about restricting expression but
how to express meaning clearly and
in a machine readable form
A bottomless resource-eating pit
that consumes dollars and returns
nothing
A cure-all for all our problems
A completely solved area
Applied vs theoretical
Easy to understand Mike Bergman
26. What can ontology do for us?
Express neuroscience concepts in a way that is machine readable
Synonyms, lexical variants
Definitions
Provide means of disambiguation of strings
Nucleus part of cell; nucleus part of brain; nucleus part of atom
Rules by which a class is defined, e.g., a GABAergic neuron is neuron that
releases GABA as a neurotransmitter
Properties
Quantities
Provide universals for navigating across different data sources
Semantic “index”
Perform reasoning
Link data through relationships not just one-to-one mappings
Provide the basis for concept-based queries to probe and mine data
As a branch of philosophy, make us think about the nature of the
things we are trying to describe, e.g., synapse is a site
27. Linking datatypes to semantics: What is
the average diameter of a Purkinje
neuron dendrite?
Branch structure not a tree,
not a set of blood vessels, not
a road map but a DENDRITE
Because anyone who uses
Neurolucida uses the same
concepts: axon, dendrite, cell
body, dendritic spine,
information systems can
combine the data together in
meaningful ways
Neurolucida doesn’t, however,
tell you that dendrite belongs
to a neuron of a particular
type or whether this dendrite
is a neural dendrite at all
( (Color Yellow) ; [10,1]
(Dendrite)
( 5.04 -44.40 -89.00 1.32) ; Root
( 3.39 -44.40 -89.00 1.32) ; R, 1
(
( 2.81 -45.10 -90.00 0.91) ; R-1, 1
( 2.81 -45.18 -90.00 0.91) ; R-1, 2
( 1.90 -46.01 -90.00 0.91) ; R-1, 3
( 1.82 -46.09 -90.00 0.91) ; R-1, 4
( 0.91 -46.59 -90.00 0.91) ; R-1, 5
( 0.41 -46.83 -92.50 0.91) ; R-1, 6
(
( -0.66 -46.92 -88.50 0.74) ; R-1-1, 1
( -0.74 -46.92 -88.50 0.74) ; R-1-1, 2
( -2.15 -47.25 -88.00 0.74) ; R-1-1, 3
( -2.15 -47.33 -88.00 0.74) ; R-1-1, 4
( -3.06 -47.00 -87.00 0.74) ; R-1-1, 5
( -4.05 -46.92 -86.00 0.74) ; R-1-1, 6
Output of Neurolucida neuron trace
28. “A rose by any other name...”:
Identity:
Entities are uniquely identifiable
Name is a meaningless numerical identifier (URI: Uniform resource identifier)
Any number of human readable labels can be assigned to it
Definition:
Genera: is a type of (cell, anatomical structure, cell part)
Differentia: “has a” A set of properties that distinguish among members of that
class
Can include necessary and sufficient conditions
Implementation: How is this definition expressed
Depending on the nature of the concept or entity and the needs of the
information system, we can say more or fewer things
Different languages; can express different things about the concept that can be
computed upon
OWLW3C standard, RDF
29. Entity recognition: Are you the M Martone
who...
The Gene Wiki: community intelligence applied to human gene annotation.
Huss JW 3rd, Lindenbaum P, Martone M, Roberts D, Pizarro A, Valafar F, Hogenesch
JB, Su AI. Nucleic Acids Res. 2010 Jan;38(Database issue):D633-9.
Ontologies for Neuroscience:What are they and What are they Good for? Larson SD,
Martone ME. Front Neurosci. 2009 May;3(1):60-7. Epub 2009 May 1.
Three-dimensional electron microscopy reveals new details of membrane systems for
Ca2+ signaling in the heart. HayashiT, Martone ME,Yu Z,Thor A, Doi M, Holst MJ,
Ellisman MH, Hoshijima M. J Cell Sci. 2009 Apr 1;122(Pt 7):1005-13.
Traumatic brain injury and the goals of care.Martone M. Hastings Cent Rep. 2006 Mar-
Apr;36(2):3.
Three-dimensional pattern of enkephalin-like immunoreactivity in the caudate nucleus of the
cat.Groves PM, Martone M,Young SJ, Armstrong DM. J Neurosci. 1988 Mar;8(3):892-900.
Some analyses of forgetting of pictorial material in amnesic and demented
patients.Martone M, Butters N,Trauner D. J Clin Exp Neuropsychol. 1986 Jun;8(3):161-78.
30. ID: 555 55 5555
Full URI-
http://usagov/ss#555555555
Label: Maryann Elizabeth
Martone
Synonym: ME Martone, M
Martone, Maryann
Abbreviation: MEM
Is a
Has a
Is that entity which has these
properties
M Martone
Dept of
Psychiatry,
UCSD
MH
Ellisman
Publications
BostonVA
Hospital
Text mining algorithms can discover a lot of things
about me
31. NIFSTD: Comprehensive Ontology
NIF covers multiple structural scales and domains of relevance to neuroscience
Aggregate of community ontologies with some extensions for neuroscience, e.g., Gene Ontology, Chebi,
Protein Ontology
Simple, basic “is a : hierarchies that can be used “as is” or to form the building blocks for more complex
representations
NIFSTD
Organism
NS FunctionMolecule Investigation
Subcellular
structure
Macromolecule Gene
Molecule Descriptors
Techniques
Reagent Protocols
Cell
Resource Instrument
Dysfunction Quality
Anatomical
Structure
32. Query across resources: Snca
and striatum
NIF uses the NIFSTD ontologies to query across sources that use very
different terminologies, symbolic notations and levels of granularity
34. Concept-based search: search by meaning
SearchGoogle: GABAergic neuron
Search NIF: GABAergic neuron
NIF automatically searches for types of GABAergic
neurons
Types of GABAergic
neurons
35. NIF #1: You have to be able to
find it...
What genes are upregulated by drugs of abuse in the adult
mouse?
Morphine
Increased
expression
Adult Mouse
36. Integration of knowledge based on relationships
Looking for commonalities and distinctions among animal
models and human conditions based on phenotypes
Sarah Maynard, Chris Mungall, Suzie Lewis NINDS
Thalamus
Cellular inclusion
Midline nuclear
group
Lewy Body
Paracentral nucleus
Cellular inclusion
37. Building ontologies: modified
OBO Foundry principles
NIF has adopted certain practices which we have found
make it easier to build and work with ontologies in
neuroscience
Unique numerical identifers for class names
Single asserted hierarchies
Avoid multiple inheritance
Use community ontologies
One ontology per domain
Open Bio Ontologies
38. Asserted vs defined classes: the
power of explicit semantics
Asserted class: Purkinje cell is a type of neuron
Why? Because I said so!
Defined class: Purkinje cell is a GABAergic neuron
http://ontology.neuinfo.org/NIF/BiomaterialEntities/NIF-Neuron-
NT-Bridge.owl#nlx_neuron_nt_090803
Why? Because it is a member of the class Neuron that releases
neurotransmitter GABA
Logical definition based on properties
Membership in the class is computed by reasoners based on the
satisfaction of a set of conditions
Makes building ontologies tractable because you don’t have to
create multiple hierarchies; you can infer them
39. Reclassification of a flat hierarchy based on logical definitions
The principle of
single inheritance
•Each class belongs to
only a single asserted
hierarchy that is
generally fairly
uninteresing
•Through the
assignments of
properties and
restrictions, each class
may belong to many
defined hierarchies
•The criteria for
membership in that
class is explicit
•Easier bookkeeping
40. The case for shared ontologies
Brain
Cerebellum
Cerebellar
Cortex
Cerebellar Purkinje
cell
Purkinje neuron
Purkinje cell
soma
Purkinje cell
layer
Cerebellar
cortex
IP3
Cerebellum
•To create the
linkages requires
mapping
•Mapping is
usually
incomplete and
not always
possible
•Can’t take
advantage of
others’ workTop down anatomy ontology Cell centered anatomy ontology
41. Cerebellum
Purkinje cell
soma
Cerebellum
Purkinje cell
dendrite
Cerebellum
Purkinje cell axon
(Cell part
ontology)
Cerebellum granule cell
layer (Anatomy ontology)
Cerebellum Purkinje
cell layer
Cerebellum
molecular layer
Has
part
Has
part
Has
part
Is part of
Is part of
Is part of
Shared building blocks: Knowledge base is enriched
Calbindin IP3
(CHEBI:16595)
Cerebellum
Purkinje neuron
(Cell Ontology)
Cerebellar cortex
Has part
Has part
Has part
42. Access to shared ontologies
Neuroscience Information Framework
(http://neuinfo.org): Ontologies available as
OWL file, RDF and throughWeb Services
https://confluence.crbs.ucsd.edu/display/NIF/
OntoQuestMain.
NCBO Bioportal
(http://bioportal.bioontology.org/): Repository
of ontologies for biomedical research
199 ontologies (including NIFSTD)
Contains many mappings
Provides annotation services
INCF Program on Ontologies for Neural
Structures
Neuronal RegistryTask Force
Description of neural properties
Structural Lexicon
Description of properties across scales
44. NeuroLexWiki
http://neurolex.org Stephen Larson
SemanticWiki: provides community
interface for viewing, enhancing and
modifying NIFSTD ontologies
•Provide a simple
framework for
defining the
concepts required
•Cell, Part of
brain,
subcellular
structure,
molecule
•On demand
•Assign permanent
URI
•Ontologists/knowle
dge engineers build
in complexity
•Tries to teach and
adhere to basic best
practices
45. Define by rules: Generate multiple
classifications programmatically
46. Enriching the knowledge base
Members of this class automatically
generated according to a rule expressed in
a standard query language
47. Inferring the Mesoscale
The NIFSTD is expressed in
OWL (Web Ontology
Language)
Supports reasoning and inference
Through integration with
other ontologies covering
gross anatomy and molecular
entities, we are working to
create inferences across scales
Analyze locally; infer globally
Larson and Martone, 2007Stephen Larson
48. Inferencing across scales: Compare
statements
1. Look brain region up in NeuroLex
2. Look up cells contained in the brain region
3. Find those cells that are known to project out
of that brain region
4. Look up the neurotransmitters for those cells
5. Determine whether those neurotransmitters
are known to be excitatory or inhibitory
6. Report the projection as excitatory or
inhibitory, and report the entire chain of logic
with links back to the wiki pages where they
were made
7. Make sure user can get back to each statement
in the logic chain to edit it if they think it is
wrong
Stephen Larson
CHEBI:18243
49. A semantic web for neuroscience? Good idea.
So all I have to do is...
Express your data in RDF?
Well...
Which RDF
Bio2RDF, BioRDF, Linked Data, Open Data, SemanticWeb
Use URI’s for all data elements
Well...
What exactly does that mean?
Shared Names, BioRDF, my own?
Use shared ontologies?
Well...
Which ones?
I don’t have one
They’re not stable
They take too long
I’d rather share your toothbrush
Wait forWatson 3.0
Effective data sharing is still an act of will
50. We do know some things
NIF Blog
1. Register your resource
with NIF!!!!
2: Mindfulness
Resource providers: Mindfulness that your
resource is contributing data to a global
federation
Link to shared ontology identifiers where
possible
Stable and unique identifiers for data
Explicit semantics
Database, model, atlas
Researchers: Mindfulness when publishing
data that it is to be consumed by machines
and not just your colleagues
Accession numbers for genes and species
Catalog numbers for reagents
Provide supplemental data in a form where it is
is easy to re-use
51. Many thanks to...
Amarnath Gupta, UCSD, Co Investigator
Jeff Grethe, UCSD, Co Investigator
Anita Bandrowski, NIF Curator
Gordon Shepherd,Yale University
Perry Miller
Luis Marenco
DavidVan Essen,Washington University
Erin Reid
Paul Sternberg, CalTech
ArunRangarajan
Hans Michael Muller
GiorgioAscoli,George Mason University
SrideviPolavarum
FahimImam, NIF Ontology Engineer
Karen Skinner, NIH, Program Officer
Mark Ellisman
Lee Hornbrook
Kara Lu
VadimAstakhov
XufeiQian
Chris Condit
Stephen Larson
Sarah Maynard
Bill Bug
Editor's Notes
Get demo numbers
Replace this slide with something better
Replace this example with a PRO or a true small molecule!!!!!!