SlideShare uma empresa Scribd logo
1 de 42
Big data from small data: A deep
      survey of the neuroscience
              landscape data via
     the Neuroscience Information
                       Framework

                     Maryann Martone, Ph. D.
            University of California, San Diego
“Neural Choreography”
“A grand challenge in neuroscience is to elucidate brain function in relation
   to its multiple layers of organization that operate at different spatial and
   temporal scales. Central to this effort is tackling “neural choreography” --
   the integrated functioning of neurons into brain circuits-- Neural
   choreography cannot be understood via a purely reductionist approach.
   Rather, it entails the convergent use of analytical and synthetic tools to
   gather, analyze and mine information from each level of analysis, and
   capture the emergence of new layers of function (or dysfunction) as we
   move from studying genes and proteins, to cells, circuits, thought, and
   behavior....

However, the neuroscience community is not yet fully engaged in exploiting the
  rich array of data currently available, nor is it adequately poised to capitalize
  on the forthcoming data explosion. “
                                     Akil et al., Science, Feb 11, 2011
“Data choreography”
 In that same issue of Science
   Asked peer reviewers from last year about the availability and use of
     data
      About half of those polled store their data only in their
        laboratories—not an ideal long-term solution.
      Many bemoaned the lack of common metadata and archives as a
        main impediment to using and storing data, and most of the
        respondents have no funding to support archiving
      And even where accessible, much data in many fields is too poorly
        organized to enable it to be efficiently used.

   “...it is a growing challenge to ensure that data produced during the
     course of reported research are appropriately
     described, standardized, archived, and available to all.” Lead Science
     editorial (Science 11 February 2011: Vol. 331 no. 6018 p. 649 )
A data federation problem


                                                            No single technology serves these all
                                                                        equally well.
                                                              Multiple data types; multiple
                                                                 scales; multiple databases
Whole brain data
     (20 um
microscopic MRI)
                     Mosiac LM
                   images (1 GB+)


                                    Conventional LM
                                        images


                                                      Individual cell
                                                      morphologies

Neuroscience is unlikely to be                                           EM volumes &
served by a few large databases                                         reconstructions

like the genomics and proteomics
                                                                                          Solved molecular
community                                                                                    structures
 NIF is an initiative of the NIH Blueprint consortium of institutes
   What types of resources (data, tools, materials, services) are
    available to the neuroscience community?
   How many are there?
   What domains do they cover? What domains do they not cover?
   Where are they?
      Web sites                  •   PDF files
      Databases                  •   Desk drawers
      Literature
      Supplementary material
   Who uses them?
   Who creates them?
   How can we find them?
   How can we make them better in the future?        http://neuinfo.org
We need more databases (?)




                     •NIF Registry: A
                     catalog of
                     neuroscience-relevant
                     resources
                         •> 5000 currently
                         listed
                         •> 2000 databases
                     •And we are finding
                     more every day
But we have Google!

 Current web is designed         Wikipedia: The Deep Web
  to share documents               (also called Deepnet, the
   Documents are                  invisible Web, DarkNet,
     unstructured data             Undernet or the hidden
 Much of the content of           Web) refers to World Wide
  digital resources is part of     Web content that is not
  the “hidden web”                 part of the Surface Web,
                                   which is indexed by
                                   standard search engines.
NIF must work with ecosystem as
             it is today
 NIF has developed a production technology platform for
  researchers to discover, share, access, analyze, and
  integrate neuroscience-relevant information
   Semantically-enabled search engine and interface that customizes
    results for neuroscience
   System that searches the “hidden web”, i.e., content not well served by
    search engines
      Data resources are predominantly relational, xml, text, rdf, owl
   Automated data harvesting technologies that produce dynamic indices
    of data content including databases, web pages, text, xml etc.
   Tools to make products and data available
   Designed to be populated rapidly; set up process for progressive
    refinement
NIF accomplishments
   Assembled the largest searchable
    collation of neuroscience data on the
    web                                       UCSD, Yale, Cal Tech, George Mason, Washington Univ
       Data federation
       Resource registry (materials, data,
        tools, services)
       Pub Med literature
           Full text of open access


   The largest ontology for neuroscience

   NIF search portal: simultaneous search
    over data, NIF catalog and biomedical
    literature

   Neurolex Wiki: a community wiki
    serving neuroscience concepts
                                               NIF is poised to capitalize on the new tools
   A unique technology platform               and emphasis on big data and open
   A reservoir of cross-disciplinary
                                               science
    biomedical data expertise
NIF data federation
  Percentage of data records per
           data type
                                                    Brain activation foci
                                                                   Animals
                                                                                Images


                                                            Pathways
                                                                                            Drugs


                                         connectivity
                                                                                         Antibodies

           Microarray
              98%                                                      Grants




> 180 sources; 350 M records: NIF was                   Percentage of data records per data
designed to be populated rapidly, with                   type: everything but microarray
progressive refinement of data
What do you mean by data?
      Databases come in many shapes and sizes
 Primary data:                              Registries:
     Data available for                       Metadata
      reanalysis, e.g., microarray data        Pointers to data sets or
      sets from GEO; brain images from           materials stored elsewhere
      XNAT; microscopic images
      (CCDB/CIL)                             Data aggregators
 Secondary data                               Aggregate data of the same
     Data features extracted through
                                                 type from multiple
      data processing and sometimes
                                                 sources, e.g., Cell Image
      normalization, e.g, brain structure
                                                 Library ,SUMSdb, Brede
      volumes (IBVD), gene expression        Single source
      levels (Allen Brain Atlas); brain        Data acquired within a single
      connectivity statements (BAMS)             context , e.g., Allen Brain Atlas
 Tertiary data
     Claims and assertions about the       Researchers are producing a variety of
      meaning of data                       information artifacts using a multitude of
       E.g., gene                          technologies
          upregulation/downregulation,
What types of questions can I ask?
We’d like to be able to find:
 What is known****:
   What is the average diameter of a Purkinje neuron
   Is GRM1 expressed In cerebral cortex?
   What are the projections of hippocampus?
   What genes have been found to be upregulated in
    chronic drug abuse in adults
   Is there a database of fMRI studies?
   What studies used my polyclonal antibody against
    GABA in humans?
   What rat strains have been used most
     extensively in research during the last 20 years?


 What is not known:
   Connections among data
   Gaps in knowledge
                                     Without some sort of framework, very difficult to
                                                            do
What are the connections of the
          hippocampus?
Hippocampus OR “CornuAmmonis” OR
         “Ammon’s horn”                          Query expansion: Synonyms
                                                    and related concepts
                                                      Boolean queries
       Data sources
      categorized by
     “data type” and
     level of nervous
          system                                      Tutorials for using
                                                      full resource when
                                                      getting there from
                                                               NIF
                               Common views
                               across multiple
                                   sources
       Link back to
         record in
          original
          source
Results are organized within a common
                  framework

                                                                Target site
                                                  Synapsed by
                             innervates                       Connects to
                                                    Input region
                          Synapsed with
                                     Cellular contact
                                                    Projects to
                           Axon innervates
                                           Subcellular contact
                                                              Source site
Each resource implements a different, though related model;
systems are complex and difficult to learn, in many cases
The scourge of neuroanatomical nomenclature:
    Importance of NIF semantic framework
•NIF Connectivity: 7 databases containing connectivity primary data or claims
from literature on connectivity between brain regions
    •Brain Architecture Management System (rodent)
    •Temporal lobe.com (rodent)
    •Connectome Wiki (human)
    •Brain Maps (various)
    •CoCoMac (primate cortex)
    •UCLA Multimodal database (Human fMRI)
    •Avian Brain Connectivity Database (Bird)

•Total: 1800 unique brain terms (excluding Avian)

•Number of exact terms used in > 1 database: 42
•Number of synonym matches: 99
•Number of 1st order partonomy matches: 385
NIF’s minimum requirements for
          effective data sharing
      You (and the machine) have to be able to
        find it
         Accessible through the web
         Annotations
      You have to be able to use it
        Data type specified and in a usable form
      You have to know what the data mean
          Some semantics
          Context: Experimental metadata
          Provenance: Where did the data come from?

Reporting neuroscience data within a consistent framework helps enormously
What is an ontology?

                                    Brain
 Ontology: an explicit, formal                  has a
  representation of concepts
  relationships among them                   Cerebellum
  within a particular domain that                            has a

  expresses human knowledge in a                 Purkinje Cell Layer
  machine readable form
                                                             has a
 Branch of philosophy: a theory             Purkinje cell
  of what is                                    is a
                                    neuron
 e.g., Gene ontologies
You need to use
                                                             ontology
                                                         identifiers instead
                                                             of strings




                                                      Blah, blah,
                                                     ontology blah




“Ontology as mathematics, computer science or esperanto”-
                 AndreyRzhetsky and James A. Evans
What can ontology do for us?
                                      “Esperanto!”

 Express neuroscience concepts in a way that is machine readable
   Classes are identified by unique identifiers
   Synonyms, lexical variants
   Definitions
     Provide means of disambiguation of strings
        Nucleus part of cell; nucleus part of brain; nucleus part of atom
     Rules by which a class is defined, e.g., a GABAergic neuron is neuron that releases
       GABA as a neurotransmitter
     Properties
 Provide universals for navigating across different data sources
   Semantic “index”
   Perform reasoning
   Link data through relationships not just one-to-one mappings
      “Concept-based queries”
Power of unique identifiers: Are you the M
                    Martone who...
The Gene Wiki: community intelligence applied to human gene annotation.
Huss JW 3rd, Lindenbaum P, Martone M, Roberts D, Pizarro A, Valafar F, Hogenesch
JB, Su AI. Nucleic Acids Res. 2010 Jan;38(Database issue):D633-9.

Ontologies for Neuroscience: What are they and What are they Good for? Larson
SD, Martone ME. Front Neurosci. 2009 May;3(1):60-7. Epub 2009 May 1.

Three-dimensional electron microscopy reveals new details of membrane systems for
Ca2+ signaling in the heart. Hayashi T, Martone ME, Yu Z, Thor A, Doi M, Holst
MJ, Ellisman MH, Hoshijima M. J Cell Sci. 2009 Apr 1;122(Pt 7):1005-13.

Some analyses of forgetting of pictorial material in amnesic and demented
patients.Martone M, Butters N, Trauner D. J Clin Exp Neuropsychol. 1986 Jun;8(3):161-78.
Traumatic brain injury and the goals of care.Martone M. Hastings Cent Rep. 2006 Mar-
Apr;36(2):3.
Three-dimensional pattern of enkephalin-like immunoreactivity in the caudate nucleus of the
cat.Groves PM, Martone M,Young SJ, Armstrong DM. J Neurosci. 1988 Mar;8(3):892-900.
I am not a number (but I should
                    be)
   Full URI: Uniform
      Resource Identifier                                         Dept of
                                         Boston VA
                                                                 Psychiatry,
       http://orcid.org/1234567          Hospital
                                                                   UCSD
   Label: Maryann Elizabeth
      Martone
     Synonym: ME Martone, M                 M Martone                Female
      Martone, Maryann
     Abbreviation: MEM
     Is a
                                           Nelson
     Has a                                Butters
                                                                  Publications
     Is that entity which has
      these properties
                               Text mining algorithms can discover a lot of things
                                                   about me
ORCID project: Author ID’s
NIF Semantic Framework: NIFSTD ontology
                                                          NIFSTD


                               Anatomical
  Organism                      Structure
                                                          Cell                   Dysfunction            Quality




                                            Subcellular
         Molecule                                                             NS Function              Investigation
                                             structure




Macromolecule                       Gene                         Techniques            Resource         Instrument

             Molecule Descriptors
                                                                               Reagent            Protocols

 NIF covers multiple structural scales and domains of relevance to neuroscience
 Aggregate of community ontologies with some extensions for
  neuroscience, e.g., Gene Ontology, Chebi, Protein Ontology
 Simple, basic “is a : hierarchies that can be used “as is” or to form the building blocks
  for more complex representations
“We studied the behavior of CA2-binding proteins in
      Ca2 neurons under high and low Ca2 conditions ”




                                                 NIF queries
                                                 across over
                                                 170+
BioGrid                                          independent
Allen Brain Atlas                                databases
Brain Info
But you don’t have what I need!
•Provide a simple framework for
defining the concepts required
     •Cell, Part of
     brain, subcellular
     structure, molecule

•Community based:
    •Communities contribute
    their vocabularies
    •Reconcile and align
    concepts used by different
    domains

•Each concept gets its own
unique identifier

•Creating a computable index for
neuroscience data
    •INCF                                                Demo D03

                                   http://neurolex.org          Stephen Larson/INCF
Concept-based search: search by meaning
 Search Google: GABAergic neuron
 Search NIF: GABAergic neuron
    NIF automatically searches for types of
      GABAergic neurons

                        Types of GABAergic
                             neurons
Esperanto!

 “The trouble is that if I make up all of my own URIs, my [data]
   has no meaning to anyone else unless I explain what each URI is
   intended to denote or mean. Two [data sets] with no URIs in
   common have no information that can be interrelated.”
 NIF favors reuse of identifiers rather than mapping
   NIF imports many ontologies

 Creating ontologies to be used as common building blocks:
   modularity, low semantic overhead, is important
    Many community ontologies available covering multiple domains
      NIFSTD available via web serivices
      Bioportal (http://bioportal.bioontology.org/)



http://www.rdfabout.com/intro/#Introducing%20RDF
NIF Analytics: The Neuroscience Ecosystem
                                         Where are the data?
                            Striatum
               Brain        Hypothalamus
                            Olfactory bulb             Data source
Brain region




                       Cerebral cortex
                         NIF is in a unique position to answer questions about the neuroscience
                         ecosystem
                                                   VadimAstakhov, Kepler Workflow Engine
Whither neuroscience information?



What is potentially knowable
                                    ∞
                                  Unstructured;
       What is known:           Natural language
 Literature, images, human      processing, entity
         knowledge             recognition, image
                                 processing and
                                    analysis;
                                 communication
   What is easily machine
 processable and accessible
Open world meets closed world


                                        But...NIF has > 900,000
                                        antibodies, 250,000 model
                                        organisms, and 3 million microarray
                                        records




Query for “reference” brain structures and their parts in NIF Connectivity database
Gender bias

NIF can start to
answer interesting
questions about
neuroscience
research, not just
about neuroscience




 NIF Reports:
Male vs Female
What have we learned: Grabbing
   the long tail of small data
 Analysis of NIF shows
  multiple databases with
  similar scope and content

 Many contain partially
  overlapping data

 Data “flows” from one
  resource to the next
   Data is
     reinterpreted, reanalyze
     d or added to

 Is duplication good or bad?
Embracing duplication: Data Mash ups




   •NIF queries across 3 of approximately 10 fMRI databases
   •~300 PMID’swere common between Brede and SUMSdb
        •PMID serves as a unique identifier for an article
   •Same information; value added
          Same data; different aspects
Same data: different analysis
               Chronic vs acute morphine in striatum
 Gemma: Gene ID + Gene Symbol
 DRG: Gene name + Probe ID

 Gemmapresented results relative to baseline chronic
morphine; DRG with respect to saline, so direction of
change is opposite in the 2 databases

 Analysis:
   1370 statements from Gemma regarding gene expression as
    a function of chronicmorphine
   617 were consistent with DRG; over half of the claims of
    the paper were not confirmed in this analysis
   Results for 1 gene were opposite in DRG and Gemma
   45 did not have enough information provided in the paper to
    make a judgment
Taking a global view on data:
            microculture to ecosystem
 Several powerful trends should change the way we
  think about our data: One  Many
   Many data
     Generation of data is getting easier  shared data
     Data space is getting richer: more –omes everyday
     But...compared to the biological space, still sparse
   Many eyes
     Wisdom of crowds
     More than one way to interpret data
   Many algorithms
     Not a single way to analyze data
   Many analytics
     “Signatures” in data may not be directly related to the question for
       which they were acquired but tell us something really interesting

                  Are you exposing or burying your work?
The future of scientific
                 communication
       We have learned over the years how to write                      Printing press
        a scientific paper for other humans to read
        and for other agents to index
         We now have to learn how to write papers
           for automated agents (and their humans)
           to mine
       We have learned over the years to report
                                                                       Linked data cloud
        data in papers for humans to read
         We now have to learn how to publish data
           in a form and on a suitable platform for
           automated agents (and their humans) to
           mine
                                                                             Watson
Reporting neuroscience data within a consistent framework helps enormously
Why does it matter?
     47/50 major preclinical
    published cancer studies                   “There are no guidelines that
     could not be replicated                     require all data sets to be
                                                 reported in a paper; often,
     “The scientific community                  original data are removed
        assumes that the claims in a             during the peer review and
        preclinical study can be taken           publication process. “
        at face value-that although
        there might be some errors in          Getting data out sooner in a
        detail, the main message of              form where they can be exposed
        the paper can be relied on and           to many eyes and many
                                                 analyses, and easily
        the data will, for the most              compared, may allow us to
        part, stand the test of time.            expose errors and develop
        Unfortunately, this is not               better metrics to evaluate the
        always the case.”                        validity of data
Begley and Ellis, 29 MARCH 2012 | VOL 483 |      Data, not just stories about them!
NATURE | 531
Register your resource to NIF!
 1                                                                      Institutional
       “How do I share my
            data?”                                                       repositories

                                                                           Cloud
 2
     “There is no database
         for my data”                                                   INCF: Global
                                                                       infrastructure

 3     Community
        database:
        beginning



 4      Community                                                                  Education
         database:
            End
                                                   Industry             Government

NIF is designed to leverage existing investments in resources and infrastructure
It’s a messy ecosystem (and that’s OK)
NIF favors a
  hybrid, tiered, federated                        Gene
                                   Organism
  system                      Neuron      Brain part    Disease

 Domain knowledge
   Ontologies                Caudate projects to
                                    Snpc            Grm1 is upregulated in
                                                       chronic cocaine
 Claims about results              Betz cells
                                degenerate in ALS

   Virtuoso RDF triples

 Data
   Data federation
   Workflows

 Narrative
Future of Research Communications
         and e-Scholarship
 FORCE11: http://force11.org
   Founded by Phil Bourne, Tim
    Clark, Ed Hovy, Anita de Waard
    and Ivan Herman
   Bring together stakeholders with
    an interest in moving scholarly
    communication beyond reliance
    on papers and traditional impact
    metrics
   Beyond the PDF 2: Spring 2013
NIF team (past and present)
Jeff Grethe, UCSD, Co Investigator, Interim PI   Fahim Imam, NIF Ontology Engineer
AmarnathGupta, UCSD, Co Investigator             Larry Lui
Anita Bandrowski, NIF Project Leader             Andrea Arnaud Stagg
Gordon Shepherd, Yale University                 Jonathan Cachat
Perry Miller                                     Jennifer Lawrence
Luis Marenco                                     Lee Hornbrook
Rixin Wang                                       Binh Ngo
David Van Essen, Washington University           VadimAstakhov
Erin Reid                                        XufeiQian
Paul Sternberg, Cal Tech                         Chris Condit
ArunRangarajan                                   Mark Ellisman
Hans Michael Muller                              Stephen Larson
Yuling Li                                        Willie Wong
Giorgio Ascoli, George Mason University          Tim Clark, Harvard University
SrideviPolavarum                                 Paolo Ciccarese
                                                 Karen Skinner, NIH, Program Officer
Why do we create so many
           overlapping products?
                                           Science is
   “That which I cannot             incremental;we build on
build, I cannot understand”           the results of others
   Don’t trust any data you         It’s ingrained in our culture
    haven’t generated                “Build a better mousetrap and the
   Oh, now I see what you are        world will beat down our doors”
    saying                           Little credit for making someone
   Scientists know the               else’s product better
    domain, not informatics
  Yes, we are planning to               There’s more than
        do that...                      way to skin a cat....
    We are all time and resource    We are still mastering the
     constrained                      medium
    We extend projects in time      Technology is developing fast
You need to use
                                                    ontology
                                                identifiers instead
                                                    of strings




                                           Blah, blah, ont
                                             ology blah




When I talk toresource providers, neuroscientists (and
                  journal editors)...

Mais conteúdo relacionado

Mais procurados

The Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark DataThe Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark Data
vbrant
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysis
COST action BM1006
 

Mais procurados (19)

The Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark DataThe Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark Data
 
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...
 
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
 
Data Landscapes: The Neuroscience Information Framework
Data Landscapes:  The Neuroscience Information FrameworkData Landscapes:  The Neuroscience Information Framework
Data Landscapes: The Neuroscience Information Framework
 
Michener Plenary PPSR2012
Michener Plenary PPSR2012Michener Plenary PPSR2012
Michener Plenary PPSR2012
 
Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
 
B4OS-2012
B4OS-2012B4OS-2012
B4OS-2012
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
An Adaptive Filter-Framework for the Quality Improvement of Open-Source Softw...
An Adaptive Filter-Framework for the Quality Improvement of Open-Source Softw...An Adaptive Filter-Framework for the Quality Improvement of Open-Source Softw...
An Adaptive Filter-Framework for the Quality Improvement of Open-Source Softw...
 
NRNB EAC Meeting 2012
NRNB EAC Meeting 2012NRNB EAC Meeting 2012
NRNB EAC Meeting 2012
 
Whither Small Data?
Whither Small Data?Whither Small Data?
Whither Small Data?
 
Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysis
 
Itbi
ItbiItbi
Itbi
 

Destaque (8)

Big Data Landscape 2016
Big Data Landscape 2016Big Data Landscape 2016
Big Data Landscape 2016
 
Big data from small data:  A survey of the neuroscience landscape through the...
Big data from small data:  A survey of the neuroscience landscape through the...Big data from small data:  A survey of the neuroscience landscape through the...
Big data from small data:  A survey of the neuroscience landscape through the...
 
Beyond the Top 10 - Combining Profiling and Mobile Behavioral Data for Easy I...
Beyond the Top 10 - Combining Profiling and Mobile Behavioral Data for Easy I...Beyond the Top 10 - Combining Profiling and Mobile Behavioral Data for Easy I...
Beyond the Top 10 - Combining Profiling and Mobile Behavioral Data for Easy I...
 
Big Transaction Data - CMG Vegas 2012
Big Transaction Data - CMG Vegas 2012Big Transaction Data - CMG Vegas 2012
Big Transaction Data - CMG Vegas 2012
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 
W4P-Launch - Open Source Crowdsourcing platform
W4P-Launch - Open Source Crowdsourcing platformW4P-Launch - Open Source Crowdsourcing platform
W4P-Launch - Open Source Crowdsourcing platform
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Semelhante a Big data from small data: A deep survey of the neuroscience landscape data via

Where are the Data? Perspectives from the Neuroscience Information Framework.
Where are the Data? Perspectives from the Neuroscience Information Framework. Where are the Data? Perspectives from the Neuroscience Information Framework.
Where are the Data? Perspectives from the Neuroscience Information Framework.
Neuroscience Information Framework
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Hilmar Lapp
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
PagudalaSangeetha
 

Semelhante a Big data from small data: A deep survey of the neuroscience landscape data via (20)

Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
 
Where are the Data? Perspectives from the Neuroscience Information Framework.
Where are the Data? Perspectives from the Neuroscience Information Framework. Where are the Data? Perspectives from the Neuroscience Information Framework.
Where are the Data? Perspectives from the Neuroscience Information Framework.
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
 
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
Phyloinformatics and the Semantic Web
Phyloinformatics and the Semantic WebPhyloinformatics and the Semantic Web
Phyloinformatics and the Semantic Web
 
Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...
 
The Uniform Resource Layer
The Uniform Resource LayerThe Uniform Resource Layer
The Uniform Resource Layer
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) Commons
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
 
Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 

Mais de Neuroscience Information Framework

Mais de Neuroscience Information Framework (20)

Why should my institution support RRIDs?
Why should my institution support RRIDs?Why should my institution support RRIDs?
Why should my institution support RRIDs?
 
Why should Journals ask fo RRIDs?
Why should Journals ask fo RRIDs?Why should Journals ask fo RRIDs?
Why should Journals ask fo RRIDs?
 
Funders and RRIDs
Funders and RRIDsFunders and RRIDs
Funders and RRIDs
 
Martone acs presentation
Martone acs presentationMartone acs presentation
Martone acs presentation
 
INCF 2013 - Uniform Resource Layer
INCF 2013 - Uniform Resource LayerINCF 2013 - Uniform Resource Layer
INCF 2013 - Uniform Resource Layer
 
The Neuroscience Information Framework: A Scalable Platform for Information E...
The Neuroscience Information Framework: A Scalable Platform for Information E...The Neuroscience Information Framework: A Scalable Platform for Information E...
The Neuroscience Information Framework: A Scalable Platform for Information E...
 
NIF services overview
NIF services overviewNIF services overview
NIF services overview
 
NIF Lexical Overview
NIF Lexical OverviewNIF Lexical Overview
NIF Lexical Overview
 
NIF Services
NIF ServicesNIF Services
NIF Services
 
NIF Data Registration
NIF Data RegistrationNIF Data Registration
NIF Data Registration
 
NIF Data Ingest
NIF Data IngestNIF Data Ingest
NIF Data Ingest
 
NIF Data Federation
NIF Data FederationNIF Data Federation
NIF Data Federation
 
NIF Overview
NIF Overview NIF Overview
NIF Overview
 
A Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource LandscapeA Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource Landscape
 
NIF: A vision for a uniform resource layer
NIF: A vision for a uniform resource layerNIF: A vision for a uniform resource layer
NIF: A vision for a uniform resource layer
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
 
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
In Search of a Missing Link in the Data Deluge vs. Data Scarcity DebateIn Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
 
NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...
NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...
NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...
 
NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...
NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...
NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...
 
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NI...
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NI...Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NI...
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NI...
 

Último

College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
perfect solution
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Dipal Arora
 

Último (20)

Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
 
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Cuttack Just Call 9907093804 Top Class Call Girl Service Available
 
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
 
O898O367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
O898O367676 Call Girls In Ahmedabad Escort Service Available 24×7 In AhmedabadO898O367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
O898O367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
 
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
 
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Guntur  Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Guntur  Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
 

Big data from small data: A deep survey of the neuroscience landscape data via

  • 1. Big data from small data: A deep survey of the neuroscience landscape data via the Neuroscience Information Framework Maryann Martone, Ph. D. University of California, San Diego
  • 2. “Neural Choreography” “A grand challenge in neuroscience is to elucidate brain function in relation to its multiple layers of organization that operate at different spatial and temporal scales. Central to this effort is tackling “neural choreography” -- the integrated functioning of neurons into brain circuits-- Neural choreography cannot be understood via a purely reductionist approach. Rather, it entails the convergent use of analytical and synthetic tools to gather, analyze and mine information from each level of analysis, and capture the emergence of new layers of function (or dysfunction) as we move from studying genes and proteins, to cells, circuits, thought, and behavior.... However, the neuroscience community is not yet fully engaged in exploiting the rich array of data currently available, nor is it adequately poised to capitalize on the forthcoming data explosion. “ Akil et al., Science, Feb 11, 2011
  • 3. “Data choreography”  In that same issue of Science  Asked peer reviewers from last year about the availability and use of data  About half of those polled store their data only in their laboratories—not an ideal long-term solution.  Many bemoaned the lack of common metadata and archives as a main impediment to using and storing data, and most of the respondents have no funding to support archiving  And even where accessible, much data in many fields is too poorly organized to enable it to be efficiently used.  “...it is a growing challenge to ensure that data produced during the course of reported research are appropriately described, standardized, archived, and available to all.” Lead Science editorial (Science 11 February 2011: Vol. 331 no. 6018 p. 649 )
  • 4. A data federation problem No single technology serves these all equally well. Multiple data types; multiple scales; multiple databases Whole brain data (20 um microscopic MRI) Mosiac LM images (1 GB+) Conventional LM images Individual cell morphologies Neuroscience is unlikely to be EM volumes & served by a few large databases reconstructions like the genomics and proteomics Solved molecular community structures
  • 5.  NIF is an initiative of the NIH Blueprint consortium of institutes  What types of resources (data, tools, materials, services) are available to the neuroscience community?  How many are there?  What domains do they cover? What domains do they not cover?  Where are they?  Web sites • PDF files  Databases • Desk drawers  Literature  Supplementary material  Who uses them?  Who creates them?  How can we find them?  How can we make them better in the future? http://neuinfo.org
  • 6. We need more databases (?) •NIF Registry: A catalog of neuroscience-relevant resources •> 5000 currently listed •> 2000 databases •And we are finding more every day
  • 7. But we have Google!  Current web is designed  Wikipedia: The Deep Web to share documents (also called Deepnet, the  Documents are invisible Web, DarkNet, unstructured data Undernet or the hidden  Much of the content of Web) refers to World Wide digital resources is part of Web content that is not the “hidden web” part of the Surface Web, which is indexed by standard search engines.
  • 8. NIF must work with ecosystem as it is today  NIF has developed a production technology platform for researchers to discover, share, access, analyze, and integrate neuroscience-relevant information  Semantically-enabled search engine and interface that customizes results for neuroscience  System that searches the “hidden web”, i.e., content not well served by search engines  Data resources are predominantly relational, xml, text, rdf, owl  Automated data harvesting technologies that produce dynamic indices of data content including databases, web pages, text, xml etc.  Tools to make products and data available  Designed to be populated rapidly; set up process for progressive refinement
  • 9. NIF accomplishments  Assembled the largest searchable collation of neuroscience data on the web UCSD, Yale, Cal Tech, George Mason, Washington Univ  Data federation  Resource registry (materials, data, tools, services)  Pub Med literature  Full text of open access  The largest ontology for neuroscience  NIF search portal: simultaneous search over data, NIF catalog and biomedical literature  Neurolex Wiki: a community wiki serving neuroscience concepts NIF is poised to capitalize on the new tools  A unique technology platform and emphasis on big data and open  A reservoir of cross-disciplinary science biomedical data expertise
  • 10. NIF data federation Percentage of data records per data type Brain activation foci Animals Images Pathways Drugs connectivity Antibodies Microarray 98% Grants > 180 sources; 350 M records: NIF was Percentage of data records per data designed to be populated rapidly, with type: everything but microarray progressive refinement of data
  • 11. What do you mean by data? Databases come in many shapes and sizes  Primary data:  Registries:  Data available for  Metadata reanalysis, e.g., microarray data  Pointers to data sets or sets from GEO; brain images from materials stored elsewhere XNAT; microscopic images (CCDB/CIL)  Data aggregators  Secondary data  Aggregate data of the same  Data features extracted through type from multiple data processing and sometimes sources, e.g., Cell Image normalization, e.g, brain structure Library ,SUMSdb, Brede volumes (IBVD), gene expression  Single source levels (Allen Brain Atlas); brain  Data acquired within a single connectivity statements (BAMS) context , e.g., Allen Brain Atlas  Tertiary data  Claims and assertions about the Researchers are producing a variety of meaning of data information artifacts using a multitude of  E.g., gene technologies upregulation/downregulation,
  • 12. What types of questions can I ask? We’d like to be able to find:  What is known****:  What is the average diameter of a Purkinje neuron  Is GRM1 expressed In cerebral cortex?  What are the projections of hippocampus?  What genes have been found to be upregulated in chronic drug abuse in adults  Is there a database of fMRI studies?  What studies used my polyclonal antibody against GABA in humans?  What rat strains have been used most extensively in research during the last 20 years?  What is not known:  Connections among data  Gaps in knowledge Without some sort of framework, very difficult to do
  • 13. What are the connections of the hippocampus? Hippocampus OR “CornuAmmonis” OR “Ammon’s horn” Query expansion: Synonyms and related concepts Boolean queries Data sources categorized by “data type” and level of nervous system Tutorials for using full resource when getting there from NIF Common views across multiple sources Link back to record in original source
  • 14. Results are organized within a common framework Target site Synapsed by innervates Connects to Input region Synapsed with Cellular contact Projects to Axon innervates Subcellular contact Source site Each resource implements a different, though related model; systems are complex and difficult to learn, in many cases
  • 15. The scourge of neuroanatomical nomenclature: Importance of NIF semantic framework •NIF Connectivity: 7 databases containing connectivity primary data or claims from literature on connectivity between brain regions •Brain Architecture Management System (rodent) •Temporal lobe.com (rodent) •Connectome Wiki (human) •Brain Maps (various) •CoCoMac (primate cortex) •UCLA Multimodal database (Human fMRI) •Avian Brain Connectivity Database (Bird) •Total: 1800 unique brain terms (excluding Avian) •Number of exact terms used in > 1 database: 42 •Number of synonym matches: 99 •Number of 1st order partonomy matches: 385
  • 16. NIF’s minimum requirements for effective data sharing  You (and the machine) have to be able to find it  Accessible through the web  Annotations  You have to be able to use it  Data type specified and in a usable form  You have to know what the data mean  Some semantics  Context: Experimental metadata  Provenance: Where did the data come from? Reporting neuroscience data within a consistent framework helps enormously
  • 17. What is an ontology? Brain  Ontology: an explicit, formal has a representation of concepts relationships among them Cerebellum within a particular domain that has a expresses human knowledge in a Purkinje Cell Layer machine readable form has a  Branch of philosophy: a theory Purkinje cell of what is is a neuron  e.g., Gene ontologies
  • 18. You need to use ontology identifiers instead of strings Blah, blah, ontology blah “Ontology as mathematics, computer science or esperanto”- AndreyRzhetsky and James A. Evans
  • 19. What can ontology do for us? “Esperanto!”  Express neuroscience concepts in a way that is machine readable  Classes are identified by unique identifiers  Synonyms, lexical variants  Definitions  Provide means of disambiguation of strings  Nucleus part of cell; nucleus part of brain; nucleus part of atom  Rules by which a class is defined, e.g., a GABAergic neuron is neuron that releases GABA as a neurotransmitter  Properties  Provide universals for navigating across different data sources  Semantic “index”  Perform reasoning  Link data through relationships not just one-to-one mappings  “Concept-based queries”
  • 20. Power of unique identifiers: Are you the M Martone who... The Gene Wiki: community intelligence applied to human gene annotation. Huss JW 3rd, Lindenbaum P, Martone M, Roberts D, Pizarro A, Valafar F, Hogenesch JB, Su AI. Nucleic Acids Res. 2010 Jan;38(Database issue):D633-9. Ontologies for Neuroscience: What are they and What are they Good for? Larson SD, Martone ME. Front Neurosci. 2009 May;3(1):60-7. Epub 2009 May 1. Three-dimensional electron microscopy reveals new details of membrane systems for Ca2+ signaling in the heart. Hayashi T, Martone ME, Yu Z, Thor A, Doi M, Holst MJ, Ellisman MH, Hoshijima M. J Cell Sci. 2009 Apr 1;122(Pt 7):1005-13. Some analyses of forgetting of pictorial material in amnesic and demented patients.Martone M, Butters N, Trauner D. J Clin Exp Neuropsychol. 1986 Jun;8(3):161-78. Traumatic brain injury and the goals of care.Martone M. Hastings Cent Rep. 2006 Mar- Apr;36(2):3. Three-dimensional pattern of enkephalin-like immunoreactivity in the caudate nucleus of the cat.Groves PM, Martone M,Young SJ, Armstrong DM. J Neurosci. 1988 Mar;8(3):892-900.
  • 21. I am not a number (but I should be)  Full URI: Uniform Resource Identifier Dept of Boston VA Psychiatry,  http://orcid.org/1234567 Hospital UCSD  Label: Maryann Elizabeth Martone  Synonym: ME Martone, M M Martone Female Martone, Maryann  Abbreviation: MEM  Is a Nelson  Has a Butters Publications  Is that entity which has these properties Text mining algorithms can discover a lot of things about me ORCID project: Author ID’s
  • 22. NIF Semantic Framework: NIFSTD ontology NIFSTD Anatomical Organism Structure Cell Dysfunction Quality Subcellular Molecule NS Function Investigation structure Macromolecule Gene Techniques Resource Instrument Molecule Descriptors Reagent Protocols  NIF covers multiple structural scales and domains of relevance to neuroscience  Aggregate of community ontologies with some extensions for neuroscience, e.g., Gene Ontology, Chebi, Protein Ontology  Simple, basic “is a : hierarchies that can be used “as is” or to form the building blocks for more complex representations
  • 23. “We studied the behavior of CA2-binding proteins in Ca2 neurons under high and low Ca2 conditions ” NIF queries across over 170+ BioGrid independent Allen Brain Atlas databases Brain Info
  • 24. But you don’t have what I need! •Provide a simple framework for defining the concepts required •Cell, Part of brain, subcellular structure, molecule •Community based: •Communities contribute their vocabularies •Reconcile and align concepts used by different domains •Each concept gets its own unique identifier •Creating a computable index for neuroscience data •INCF Demo D03 http://neurolex.org Stephen Larson/INCF
  • 25. Concept-based search: search by meaning  Search Google: GABAergic neuron  Search NIF: GABAergic neuron  NIF automatically searches for types of GABAergic neurons Types of GABAergic neurons
  • 26. Esperanto!  “The trouble is that if I make up all of my own URIs, my [data] has no meaning to anyone else unless I explain what each URI is intended to denote or mean. Two [data sets] with no URIs in common have no information that can be interrelated.”  NIF favors reuse of identifiers rather than mapping  NIF imports many ontologies  Creating ontologies to be used as common building blocks: modularity, low semantic overhead, is important  Many community ontologies available covering multiple domains  NIFSTD available via web serivices  Bioportal (http://bioportal.bioontology.org/) http://www.rdfabout.com/intro/#Introducing%20RDF
  • 27. NIF Analytics: The Neuroscience Ecosystem Where are the data? Striatum Brain Hypothalamus Olfactory bulb Data source Brain region Cerebral cortex NIF is in a unique position to answer questions about the neuroscience ecosystem VadimAstakhov, Kepler Workflow Engine
  • 28. Whither neuroscience information? What is potentially knowable ∞ Unstructured; What is known: Natural language Literature, images, human processing, entity knowledge recognition, image processing and analysis; communication What is easily machine processable and accessible
  • 29. Open world meets closed world But...NIF has > 900,000 antibodies, 250,000 model organisms, and 3 million microarray records Query for “reference” brain structures and their parts in NIF Connectivity database
  • 30. Gender bias NIF can start to answer interesting questions about neuroscience research, not just about neuroscience NIF Reports: Male vs Female
  • 31. What have we learned: Grabbing the long tail of small data  Analysis of NIF shows multiple databases with similar scope and content  Many contain partially overlapping data  Data “flows” from one resource to the next  Data is reinterpreted, reanalyze d or added to  Is duplication good or bad?
  • 32. Embracing duplication: Data Mash ups •NIF queries across 3 of approximately 10 fMRI databases •~300 PMID’swere common between Brede and SUMSdb •PMID serves as a unique identifier for an article •Same information; value added Same data; different aspects
  • 33. Same data: different analysis Chronic vs acute morphine in striatum  Gemma: Gene ID + Gene Symbol  DRG: Gene name + Probe ID  Gemmapresented results relative to baseline chronic morphine; DRG with respect to saline, so direction of change is opposite in the 2 databases  Analysis:  1370 statements from Gemma regarding gene expression as a function of chronicmorphine  617 were consistent with DRG; over half of the claims of the paper were not confirmed in this analysis  Results for 1 gene were opposite in DRG and Gemma  45 did not have enough information provided in the paper to make a judgment
  • 34. Taking a global view on data: microculture to ecosystem  Several powerful trends should change the way we think about our data: One  Many  Many data  Generation of data is getting easier  shared data  Data space is getting richer: more –omes everyday  But...compared to the biological space, still sparse  Many eyes  Wisdom of crowds  More than one way to interpret data  Many algorithms  Not a single way to analyze data  Many analytics  “Signatures” in data may not be directly related to the question for which they were acquired but tell us something really interesting Are you exposing or burying your work?
  • 35. The future of scientific communication  We have learned over the years how to write Printing press a scientific paper for other humans to read and for other agents to index  We now have to learn how to write papers for automated agents (and their humans) to mine  We have learned over the years to report Linked data cloud data in papers for humans to read  We now have to learn how to publish data in a form and on a suitable platform for automated agents (and their humans) to mine Watson Reporting neuroscience data within a consistent framework helps enormously
  • 36. Why does it matter? 47/50 major preclinical published cancer studies  “There are no guidelines that could not be replicated require all data sets to be reported in a paper; often,  “The scientific community original data are removed assumes that the claims in a during the peer review and preclinical study can be taken publication process. “ at face value-that although there might be some errors in  Getting data out sooner in a detail, the main message of form where they can be exposed the paper can be relied on and to many eyes and many analyses, and easily the data will, for the most compared, may allow us to part, stand the test of time. expose errors and develop Unfortunately, this is not better metrics to evaluate the always the case.” validity of data Begley and Ellis, 29 MARCH 2012 | VOL 483 | Data, not just stories about them! NATURE | 531
  • 37. Register your resource to NIF! 1 Institutional “How do I share my data?” repositories Cloud 2 “There is no database for my data” INCF: Global infrastructure 3 Community database: beginning 4 Community Education database: End Industry Government NIF is designed to leverage existing investments in resources and infrastructure
  • 38. It’s a messy ecosystem (and that’s OK) NIF favors a hybrid, tiered, federated Gene Organism system Neuron Brain part Disease  Domain knowledge  Ontologies Caudate projects to Snpc Grm1 is upregulated in chronic cocaine  Claims about results Betz cells degenerate in ALS  Virtuoso RDF triples  Data  Data federation  Workflows  Narrative
  • 39. Future of Research Communications and e-Scholarship  FORCE11: http://force11.org  Founded by Phil Bourne, Tim Clark, Ed Hovy, Anita de Waard and Ivan Herman  Bring together stakeholders with an interest in moving scholarly communication beyond reliance on papers and traditional impact metrics  Beyond the PDF 2: Spring 2013
  • 40. NIF team (past and present) Jeff Grethe, UCSD, Co Investigator, Interim PI Fahim Imam, NIF Ontology Engineer AmarnathGupta, UCSD, Co Investigator Larry Lui Anita Bandrowski, NIF Project Leader Andrea Arnaud Stagg Gordon Shepherd, Yale University Jonathan Cachat Perry Miller Jennifer Lawrence Luis Marenco Lee Hornbrook Rixin Wang Binh Ngo David Van Essen, Washington University VadimAstakhov Erin Reid XufeiQian Paul Sternberg, Cal Tech Chris Condit ArunRangarajan Mark Ellisman Hans Michael Muller Stephen Larson Yuling Li Willie Wong Giorgio Ascoli, George Mason University Tim Clark, Harvard University SrideviPolavarum Paolo Ciccarese Karen Skinner, NIH, Program Officer
  • 41. Why do we create so many overlapping products? Science is “That which I cannot incremental;we build on build, I cannot understand” the results of others  Don’t trust any data you  It’s ingrained in our culture haven’t generated  “Build a better mousetrap and the  Oh, now I see what you are world will beat down our doors” saying  Little credit for making someone  Scientists know the else’s product better domain, not informatics Yes, we are planning to There’s more than do that... way to skin a cat....  We are all time and resource  We are still mastering the constrained medium  We extend projects in time  Technology is developing fast
  • 42. You need to use ontology identifiers instead of strings Blah, blah, ont ology blah When I talk toresource providers, neuroscientists (and journal editors)...

Notas do Editor

  1. Doesn’t do it well; doesn’t organize the results in a domain specific way; doesn’t search across itFor use as content goal Dynamic inventory for deep coverage of neuroscience data: Genes -> Systems
  2. Should this say collation or collection?