SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Pascale Gaudet
Chair, International Society for Biocuration
Scientific Manager, neXtProt, SIB Swiss Institute of Bioinformatics
BioDBCore: Current status
and future developments
International Society for Biocuration:
Mission statement
•  Define and promote the work of biocurators
•  Foster connections with user communities to
ensure that databases and accompanying
tools meet specific user needs
•  Promote communication and exchanges
between curators: meetings, workshops,
•  Encourage best practices by providing
documentation on standards and annotation
procedures
ISB
The need
• Databases: improve data integration from
published papers
• Journals: link to databases objects
• Researchers: identify resources
• Grant submitters: enforce data sharing plans
Goals
1)  Gather information required to provide a
general overview of the database
landscape and compare the various
resources
2)  Encourage consistency and interoperability
3)  Promote the use of standards
4)  Provide guidance for users
5)  Maximize the collective impact of the
resources
BioDBcore group organization
•  Lead by Pascale Gaudet (ISB/SIB) and
Philippe-Rocca-Serra (BioSharing)
•  Guidelines proposed in 2011 paper
•  Implemented in 2012 NAR database issue
Use cases
•  Show all resources of type database which use
MIMARK guidelines
•  Show all resources where John Smith is involved
•  Show all resources for mouse phenotypes
•  Where can I submit my data?
and also: 
•  Guidance for grants’ data sharing policies
•  Improving integration of data from papers into
databases
Collaborative philosophy
•  Many groups/resources have been providing
registries and lists of databases
•  Often not funded, not maintained
•  BioDBCore seeks to collaborate with all interested
parties to work together to provide a more
permanent solution to database descriptions
BioDBcore: Participating groups
²  BioDB100
²  BioSharing
²  BioCatalogue 
²  Bioinformatics Links Directory 
²  Biositemaps 
²  CASIMIR 
²  MIBBI
²  MIRIAM 
²  Model Organism Databases
²  NIF registry 
²  … and your group !
BioDBCore descriptors

 1.  Database name
2.  Main resource URL
3.  Contact information (e-mail; postal mail)
4.  Date resource established (year)
5.  Conditions of use (Free, or type of license)
6.  Scope: data types captured, curation policy,
standards used
7.  Standards: MIs, Data formats, Terminologies
8.  Taxonomic coverage
9.  Data accessibility/output options
10.  Data release frequency
11.  Versioning policy and access to historical files
12.  Documentation available
13.  User support options
14.  Data submission policy
15.  Relevant publications
16.  Resource’s Wikipedia URL
17.  Tools available
Database name dictyBase
Main resource URL http://dictybase.org
Contact information dictybase@northwestern.edu
Date resource established (year)2003
Conditions of use Free
Scope: Data types captured
Genome sequence; gene models including CDS and predicted proteins;
Phenotypes,
Gene Ontology annotations,
Functional annotation (gene product names),
Gene nomenclature;
Strains; Plasmids;
Free text descriptions,
Domains (via InterPro), Orthologs (via OrthoMCL and inParanoid), Protein
subcellular location (via Swiss-Prot); Protein existence (via Swiss-Prot),
Citations, Researchers database
Curation policy manual curation
Standards: MIs, Data formats, Terminologies Gene Ontology,
Dicty Anatomy Ontology, Dicty Gene Nomeclature
Data formats FASTA, OBO, GAF, GFF3 (standard)
Taxonomic coverage (use NCBI Taxid) D. discoideum (44689)
including all strains [PRIMARY], also some genome/EST/gene
model info for D. purpureum (5786), and gene model sequences
for P. pallidum (13642) and D. fasiculatum (261658)
Data accessibility/output optionsHTML, text, database reports
Data release frequency curators work on the 'live' database,
weekly data dumps (sequences) or monthly
(other data)
Versioning policy/ access to historical files no versioning
but access to historical
files is possible
Documentation available http://dictybase.org/FAQ/
HelpFilesIndex.html
User support options documents, email, webform
Data submission policy Data from published literature. Some HTP
data
corresponding to published analyses is
incorporated
Relevant publications PMID: 18974179, PMID: 14681427
Resource’s Wikipedia URL
http://en.wikipedia.org/wiki/DictyBase
Tools available BLAST, BioMart, Generic Genome Browser, TextPresso,
MetaCyc (dictyCyc)
Implementation of BioDBCore at
BioSharing (Many thanks to Philippe RS !)
BioDBcore announcement
Published in Nucleic Acids Research database issue 2011
and in the DATABASE journal
Implementation plan
•  Goal: BioDBCore data public and linked
•  Community aware approach: reuse existing
stuff
•  Current Data model: RDF based on
categories from BioSiteMap, MIRIAM, NIF,
Dublin core, Darwin Core
•  Defined extension mechanisms
www.biodbcore.org
Example BioDBCore entry (1/2)
Example BioDBCore entry (2/2)
Creating, editing, maintaining entries
•  Until now: records are manually created from data
provided by NAR at publication of Database issue
and the Life Sciences Registry (Michel Dumontier and
Nick Juty)
- Those mostly come as xls files that need to be
manually entered
- Close to 200 records have been entered
out of over 2,000 obtained
Beyond maintenance at BioSharing
Ideally database providers would maintain their BioDBCore
record up to date
•  Claim ownership
- A database provider can now (in theory) maintain his
own BioDBCore record
Encouraging best practices
•  DATABASE and Nucleic Acids Research journals:
Editors in chief request BioDBCore information from
submitters
•  ISB seal of approval
•  BioDB100 - launched at InCoB 2011 – examples of 100 well
annotated databases
What’s next ? 
q  Continue to extend participating groups and journals
q  Refine scope
q  Integrate semantic support
q  Develop querying system
q  Implement validation tests
q  Set up mechanisms for exchange of data among
collaborating groups (in BioDBCore RDF format, or
other)
Identifying or developing
semantic support
•  Policies and guidelines: BioSharing
•  Publications and taxon info: identifiers.org
•  Authors: ORCID (will also implement
organizations)
•  Keywords/database scope: NIF when possible
Identifying resources is preferable to developing them !
For biohackaton2013
q  Evaluate need for BioDBCore in today’s landscape
of metadatabase resources
q  Evaluate further collaboration opportunities
q  Set up a better system for creating and maintaining
BioDBCore records
q  Identify/develop ontologies pertinent to BioDBCore
Acknowledgements
Philippe Rocca-Serra
Susanna-Assunta Sansone
Eamonn Maguire
Alejandra Gonzalez Beltran
International Society for Biocuration
Michael Galperin
David Landsman
Francis Ouellette
OXFORD	
  UNIVERSITY	
  PRESS	
  
collaborators

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016
 
Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...
 
The Global Biodiversity Information Facility and Africa Rising
The Global Biodiversity Information Facility and Africa RisingThe Global Biodiversity Information Facility and Africa Rising
The Global Biodiversity Information Facility and Africa Rising
 
GBIF BIFA mentoring, Day 2 Publish data, July 2016
GBIF BIFA mentoring, Day 2 Publish data, July 2016GBIF BIFA mentoring, Day 2 Publish data, July 2016
GBIF BIFA mentoring, Day 2 Publish data, July 2016
 
Dspace Webinar
Dspace WebinarDspace Webinar
Dspace Webinar
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics
 
Standardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresStandardisation in BMS European infrastructures
Standardisation in BMS European infrastructures
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
 
1st USETDA Annual Conference 2011
1st USETDA Annual Conference 2011 1st USETDA Annual Conference 2011
1st USETDA Annual Conference 2011
 
ETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challengesETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challenges
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
 
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
 
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Rusbridge Feb 8 Improving Clarity around Continuing Access
Rusbridge Feb 8 Improving Clarity around Continuing AccessRusbridge Feb 8 Improving Clarity around Continuing Access
Rusbridge Feb 8 Improving Clarity around Continuing Access
 
Wilcox - Open Source Repositories and the Future of Fedora
Wilcox - Open Source Repositories and the Future of FedoraWilcox - Open Source Repositories and the Future of Fedora
Wilcox - Open Source Repositories and the Future of Fedora
 
#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishing#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishing
 
Ices wgdim-may-2010
Ices wgdim-may-2010Ices wgdim-may-2010
Ices wgdim-may-2010
 

Destaque

Using computational predictions to improve literature-based Gene Ontology ann...
Using computational predictions to improve literature-based Gene Ontology ann...Using computational predictions to improve literature-based Gene Ontology ann...
Using computational predictions to improve literature-based Gene Ontology ann...
Pascale Gaudet
 

Destaque (7)

Using computational predictions to improve literature-based Gene Ontology ann...
Using computational predictions to improve literature-based Gene Ontology ann...Using computational predictions to improve literature-based Gene Ontology ann...
Using computational predictions to improve literature-based Gene Ontology ann...
 
José Cruz Toledo - Aptamer basebc2012
José Cruz  Toledo - Aptamer basebc2012José Cruz  Toledo - Aptamer basebc2012
José Cruz Toledo - Aptamer basebc2012
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHO
 
Millburn - Flybase community curation
Millburn - Flybase community curationMillburn - Flybase community curation
Millburn - Flybase community curation
 
Masson - ViralZone
Masson - ViralZoneMasson - ViralZone
Masson - ViralZone
 
Rinaldi - ODIN
Rinaldi - ODINRinaldi - ODIN
Rinaldi - ODIN
 
Lock - PomBase community curation
Lock - PomBase community curationLock - PomBase community curation
Lock - PomBase community curation
 

Semelhante a BioDBCore: Current Status and Next Developments

Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
Karlsruhe Institute of Technology (KIT)
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Susanna-Assunta Sansone
 

Semelhante a BioDBCore: Current Status and Next Developments (20)

Gaudet - BioDBcore
Gaudet - BioDBcoreGaudet - BioDBcore
Gaudet - BioDBcore
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
Cross-linked metadata standards, repositories and the data policies - The Bio...
Cross-linked metadata standards, repositories and the data policies - The Bio...Cross-linked metadata standards, repositories and the data policies - The Bio...
Cross-linked metadata standards, repositories and the data policies - The Bio...
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
 
The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositories
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Data sharing as part of the research workflow
Data sharing as part of the research workflowData sharing as part of the research workflow
Data sharing as part of the research workflow
 
BioMed Central's open data initiatives
BioMed Central's open data initiativesBioMed Central's open data initiatives
BioMed Central's open data initiatives
 
An Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data ResourceAn Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data Resource
 
Digital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansDigital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic Librarians
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access
 
NIH Data Science Special Interest Group
NIH Data Science Special Interest GroupNIH Data Science Special Interest Group
NIH Data Science Special Interest Group
 
Fair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevortFair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevort
 
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWSRDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
 
Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service
 
Researh data management
Researh data managementResearh data management
Researh data management
 
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

BioDBCore: Current Status and Next Developments

  • 1. Pascale Gaudet Chair, International Society for Biocuration Scientific Manager, neXtProt, SIB Swiss Institute of Bioinformatics BioDBCore: Current status and future developments
  • 2. International Society for Biocuration: Mission statement •  Define and promote the work of biocurators •  Foster connections with user communities to ensure that databases and accompanying tools meet specific user needs •  Promote communication and exchanges between curators: meetings, workshops, •  Encourage best practices by providing documentation on standards and annotation procedures ISB
  • 3. The need • Databases: improve data integration from published papers • Journals: link to databases objects • Researchers: identify resources • Grant submitters: enforce data sharing plans
  • 4. Goals 1)  Gather information required to provide a general overview of the database landscape and compare the various resources 2)  Encourage consistency and interoperability 3)  Promote the use of standards 4)  Provide guidance for users 5)  Maximize the collective impact of the resources
  • 5. BioDBcore group organization •  Lead by Pascale Gaudet (ISB/SIB) and Philippe-Rocca-Serra (BioSharing) •  Guidelines proposed in 2011 paper •  Implemented in 2012 NAR database issue
  • 6. Use cases •  Show all resources of type database which use MIMARK guidelines •  Show all resources where John Smith is involved •  Show all resources for mouse phenotypes •  Where can I submit my data? and also: •  Guidance for grants’ data sharing policies •  Improving integration of data from papers into databases
  • 7. Collaborative philosophy •  Many groups/resources have been providing registries and lists of databases •  Often not funded, not maintained •  BioDBCore seeks to collaborate with all interested parties to work together to provide a more permanent solution to database descriptions
  • 8. BioDBcore: Participating groups ²  BioDB100 ²  BioSharing ²  BioCatalogue ²  Bioinformatics Links Directory ²  Biositemaps ²  CASIMIR ²  MIBBI ²  MIRIAM ²  Model Organism Databases ²  NIF registry ²  … and your group !
  • 9. BioDBCore descriptors 1.  Database name 2.  Main resource URL 3.  Contact information (e-mail; postal mail) 4.  Date resource established (year) 5.  Conditions of use (Free, or type of license) 6.  Scope: data types captured, curation policy, standards used 7.  Standards: MIs, Data formats, Terminologies 8.  Taxonomic coverage 9.  Data accessibility/output options 10.  Data release frequency 11.  Versioning policy and access to historical files 12.  Documentation available 13.  User support options 14.  Data submission policy 15.  Relevant publications 16.  Resource’s Wikipedia URL 17.  Tools available
  • 10. Database name dictyBase Main resource URL http://dictybase.org Contact information dictybase@northwestern.edu Date resource established (year)2003 Conditions of use Free Scope: Data types captured Genome sequence; gene models including CDS and predicted proteins; Phenotypes, Gene Ontology annotations, Functional annotation (gene product names), Gene nomenclature; Strains; Plasmids; Free text descriptions, Domains (via InterPro), Orthologs (via OrthoMCL and inParanoid), Protein subcellular location (via Swiss-Prot); Protein existence (via Swiss-Prot), Citations, Researchers database
  • 11. Curation policy manual curation Standards: MIs, Data formats, Terminologies Gene Ontology, Dicty Anatomy Ontology, Dicty Gene Nomeclature Data formats FASTA, OBO, GAF, GFF3 (standard) Taxonomic coverage (use NCBI Taxid) D. discoideum (44689) including all strains [PRIMARY], also some genome/EST/gene model info for D. purpureum (5786), and gene model sequences for P. pallidum (13642) and D. fasiculatum (261658) Data accessibility/output optionsHTML, text, database reports Data release frequency curators work on the 'live' database, weekly data dumps (sequences) or monthly (other data) Versioning policy/ access to historical files no versioning but access to historical files is possible
  • 12. Documentation available http://dictybase.org/FAQ/ HelpFilesIndex.html User support options documents, email, webform Data submission policy Data from published literature. Some HTP data corresponding to published analyses is incorporated Relevant publications PMID: 18974179, PMID: 14681427 Resource’s Wikipedia URL http://en.wikipedia.org/wiki/DictyBase Tools available BLAST, BioMart, Generic Genome Browser, TextPresso, MetaCyc (dictyCyc)
  • 13. Implementation of BioDBCore at BioSharing (Many thanks to Philippe RS !)
  • 14. BioDBcore announcement Published in Nucleic Acids Research database issue 2011 and in the DATABASE journal
  • 15. Implementation plan •  Goal: BioDBCore data public and linked •  Community aware approach: reuse existing stuff •  Current Data model: RDF based on categories from BioSiteMap, MIRIAM, NIF, Dublin core, Darwin Core •  Defined extension mechanisms
  • 19. Creating, editing, maintaining entries •  Until now: records are manually created from data provided by NAR at publication of Database issue and the Life Sciences Registry (Michel Dumontier and Nick Juty) - Those mostly come as xls files that need to be manually entered - Close to 200 records have been entered out of over 2,000 obtained
  • 20. Beyond maintenance at BioSharing Ideally database providers would maintain their BioDBCore record up to date •  Claim ownership - A database provider can now (in theory) maintain his own BioDBCore record Encouraging best practices •  DATABASE and Nucleic Acids Research journals: Editors in chief request BioDBCore information from submitters •  ISB seal of approval •  BioDB100 - launched at InCoB 2011 – examples of 100 well annotated databases
  • 21. What’s next ? q  Continue to extend participating groups and journals q  Refine scope q  Integrate semantic support q  Develop querying system q  Implement validation tests q  Set up mechanisms for exchange of data among collaborating groups (in BioDBCore RDF format, or other)
  • 22. Identifying or developing semantic support •  Policies and guidelines: BioSharing •  Publications and taxon info: identifiers.org •  Authors: ORCID (will also implement organizations) •  Keywords/database scope: NIF when possible Identifying resources is preferable to developing them !
  • 23. For biohackaton2013 q  Evaluate need for BioDBCore in today’s landscape of metadatabase resources q  Evaluate further collaboration opportunities q  Set up a better system for creating and maintaining BioDBCore records q  Identify/develop ontologies pertinent to BioDBCore
  • 24. Acknowledgements Philippe Rocca-Serra Susanna-Assunta Sansone Eamonn Maguire Alejandra Gonzalez Beltran International Society for Biocuration Michael Galperin David Landsman Francis Ouellette OXFORD  UNIVERSITY  PRESS   collaborators