O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Martone grethe

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 15 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (7)

Anúncio

Semelhante a Martone grethe (20)

Mais recentes (20)

Anúncio

Martone grethe

  1. 1. Methodologies for Long-Tail Data Sharing: What Have We Learned? Maryann E. Martone, Ph. D. University of California, San Diego and Hypothesis Jeffrey S. Grethe, Ph. D. University of California, San Diego
  2. 2. Database Software Application Data Analysis Service Topical Portal Core Facility Ontology Software Resource Years: NIF is an initiative of the NIH Blueprint consortium of institutes – NIF has been tracking and cataloging the biomedical resource landscape since 2008
  3. 3. The current “Addictome" NIF searches across: • Resource Registry (13,000+) • > 200 deeply integrated data sources (>800 million records) • literature Query: Addiction
  4. 4. N ORCID RRID Data Digital world runs on globally unique and persistent identifiers; PID’s serve as a “key” for identifying the same entity across different contexts e-Science Ecosystem Metadatastandards Aggregator People Research resources Ontology Concepts DOI Protocols Minimal Information Models TranslationNon-digital Repositories and Registries e.g. NIF, Monarch NIH Data DIscovery Index CDE E eScience goal: Make data Findable, Accessible, Interoperable, Re-usable (FAIR) for both human and machine PID
  5. 5. Resource Identification Initiative: Supplying unique identifiers for key research resources “The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution, Sigma-Aldrich)…” “The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution, Sigma-Aldrich, RRID:AB_262137)…” VS https://scicrunch.org/resolver/RRID:AB_262137
  6. 6. Minimal Information Standards http://precedings.nature.com/documents/1720/version/1 http://precedings.nature.com/documents/1720/version/1/files/npre20081720-1.pdf A set of guidelines for reporting data that ensures the data can be easily verified, analysed and clearly interpreted by the wider scientific community. The recommendations also provide a foundation for structured databases, public repositories and development of data analysis tools. https://en.wikipedia.org/wiki/Minimum_Information_Standards MINI: Minimum Information about a Neuroscience Investigation MIM CDE 1 CDE 2 CDE N • • • Value Set
  7. 7. Common Data Elements https://cde.nlm.nih.gov/home http://www.nlm.nih.gov/cde/ A data element that is common to multiple datasets and is used to improve data quality and promote data sharing. CDEs usually describe the following data element properties: Name, Definition, Instructions, Provenance, Value Set.
  8. 8. Value Sets The set of possible values or responses. A Value Set often includes concepts from established Vocabularies, Ontologies or Data Standards. A value set may also include a range of permissible values and indicate the required units. For a survey question, the value set may be a list of possible responses. http://neurolex.org/wiki/Category:Hippocampus_CA1_pyramidal_cell
  9. 9. Neuroscience Information Framework “a tool for analyzing and structuring information” “a reduction in uncertainty” • Ontologies are the major way that NIF searches for and organizes information • Aggregate of community ontologies, e.g., Gene Ontology, Chebi, Protein Ontology • Still significant gaps for behavioral and physiological concepts and techniques • Available as services through NIF so they can be built into applications Organism Molecule Macromolecule Gene Molecule Descriptors Cell Resource Instrument Dysfunction QualityAnatomical Structure NS Function Subcellular structure Investigation ProtocolsReagent Techniques NIFSTD
  10. 10. Concept-based query Remove synonyms Ontologies and their relationships let us probe the data space for related concepts
  11. 11. What have we learned? • The landscape is vibrant, dynamic and growing, but also littered with abandoned and unrealized projects • Data belongs in a data repository, not on your lab server • People are important in this endeavor: Leaders, curators, community engagement specialists • Data and ontology resources become interesting when they are comprehensive: populate!!! • Assume that you will be resource limited and plan accordingly: time, money, personnel • Cost-benefit analysis; what to do now vs later • Technology will improve • Don’t start from square 1-resources exist to help; help support them
  12. 12. Extra Slides 12
  13. 13. Dimensions of FAIR data sharing • Discoverability – Data can be found – Data set has an identifier and links are stable • Accessibility – Data can be accessed programmatically – Access rights are clear • Assessability – Provenance is known – Reliability can be determined • Understandability – The data can be understood • Usability – The data are actionable – Data are not in a proprietary format ? ? Goodman, A. et al. Ten simple rules for the care and feeding of scientific data. PLoS Comput Biol 10, e1003542, doi:10.1371/journal.pcbi.1003542 (2014) Science as an open enterprise, Royal Society: https://royalsociety.org/policy/projects/science-public- enterprise/Report/
  14. 14. FORCE11: Future of Research Communications and e-Scholarship • Resource Identification Initiative: https://www.force11.org/group/resource-identification- initiative • FAIR Data Guiding principles: https://www.force11.org/group/fairgroup/fairprinciples • Data Citation Principles: https://www.force11.org/group/joint-declaration-data- citation-principles-final • On creating machine-readable data citations: https://peerj.com/articles/cs-1/ • 10 Simple rules for design, provision, and reuse of persistent identifiers for life science data: https://zenodo.org/record/18003#.VeOxxLQjvyAFORCE11.org: Grass roots organization dedicated to transforming scholarship through
  15. 15. Forebrain Midbrain Hindbrain 0 1-10 11-100 >101 Data Sources Mapping the data landscape: Anatomical framework ~800 million records across ~200 databases or views

Notas do Editor

  • Figure X: Resource types and year added to the registry. Research resources are each tagged with one or more resource types, the most common are represented in this graph (for all data see http://neurolex.org/wiki/Resource_Type_Hierarchy). The year that a resource was added to the registry is denoted by the color, note that 2009 and earlier data are lumped into 2010.

×