SlideShare uma empresa Scribd logo
1 de 24
The Barcode of Life
     Data Portal
(http://bol.uvm.edu)
 Dr. David E Schindel, Executive Secretary
    Michael Trizna, Database Specialist
 Consortium for the Barcode of Life (CBOL)
           Smithsonian Institution
             Washington, DC
          www.barcodeoflife.org;
  SchindelD@si.edu and TriznaM@si.edu
Contents of Presentation
Crowd-sourced open source software
How does Data Portal complement BOLD
and GenBank?
Data Portal capabilities
Case Study: Smithsonian frozen bird
tissue project
An Experiment in Museum Tissue
 Mining and Fast Data Release
  Tissue sampling winter/spring
  Sequencing completed in September
  Sequence quality control in October
  Taxonomic checking in early November
   – Obvious errors removed
   – Minor discrepancies remain
  Data released for Adelaide Conference
   – Crowd-sourced annotation by community
   – Will data be mis-used?
Unique Data Portal Capabilities
 Creating customized datasets from public
 and/or your private data
 Online library of standard datasets
 Support sharing within project teams using
 Connect IDs, easy link to Working Groups
 Running different identification analyses
 based on different methodologies:
  – Standard sequence input using FASTA format
  – Use standard or customized datasets
Barcode Aggregator




 727,170 public records
Summary Statistics per Family
Creating Customized Datasets
Existing Data Analysis Packages
  LIST of packages
  – BLOG
  – BRONX
  – Kernel
  – CAOS
  – USEARCH
  – BLAST
  Output of identification routines as
  probabilities of assignment
Data Analysis Methods Session
 New packages presented Friday
 afternoon:
  – Damon Little: Automatic Plants Barcode
    pipeline (from raw traces to trimmed/edited
    sequences)
  – Ka Hou Chu: Composite Vector Method
    (profile trees for faster alignment and tree-
    based analysis)
  – Alain Franc: Matching Next Generation results
    to Sanger-based reference records
Sample output
CONNECT for Data Portal
    Collaboration
The USNM Bird Project
USNM Division of Birds frozen tissue
collection:
– 21,104 specimens, 2512 species
Which new ones ones to sample/barcode?
Public records for birds
– All public bird COI records: 10,967
– All BARCODE records in GenBank: 8,419
– BARCODE with taxonomic names: 7,965
– BARCODE, name and 2 traces: 2,388
Moving Data Among
 BOLD, GenBank, Data Portal
  USNM Excel                     BOLD
  Spreadsheet           Split into projects that
(KE-Emu Source)          consist of 2-4 plates




Local database that         Data Portal
holds all fields from       Aggregator
    the original             database
   spreadsheet
Creating a ‘Pick List’
Spreadsheet of tissue samples compared
with:
– ITIS taxonomy
– Clemens species list in BOLD
– Counts of GenBank and/or public BOLD
  records
– Geographic informattion
Screenshot of USNM list side-by-side with
BOLD records
Identifying Samples to be Subsampled
Side-by-Side Lists
USNM Bird Dataset
3150 tissues sampled
168 failed sequences
94 problematic sequences
166 clustered badly
2761 ‘BARCODE-ready’ samples
1,147 ‘first-BARCODE’ species
91% increase over 1,259 barcoded species
(3,892 listed in BOLD includes BINs, others)
Two problematic clades, USNM data
  Flycatchers: Family Tyrannidae
   – Sublegatus arenarum, S. modestus, S.
     obscurior, S. sp.
   – Conopias parvus, C. albovittatus
   – Myiarchus ferox, M. swainsoni, M. sp.
  Hummingbirds: Family Trochilidae
   – Phaethornis longuemareus
  Inconsistencies within USNM dataset
  Incompatibilities with public, other data
Resolving Mis-identified
      Specimens
What testing dataset to use?
ID trees and analytical routines could use:
– All public bird COI records: 10,967
– All BARCODE records in GenBank: 8,419
– BARCODE with taxonomic names: 7,965
– BARCODE, name and 2 traces: 2,388
Which ones have reliable taxonomic IDs?
Preparing a Data Release Paper
 Summary statistics from Data Portal




 Figures from BOLD

Mais conteúdo relacionado

Mais procurados

Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
Michel Dumontier
 

Mais procurados (20)

The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
 
DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinks
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
 
Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.
 
Rap db(rice annotation project data base)
Rap db(rice annotation project data base)Rap db(rice annotation project data base)
Rap db(rice annotation project data base)
 
Kegg databse
Kegg databseKegg databse
Kegg databse
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
Textming chancediscovery
Textming chancediscoveryTextming chancediscovery
Textming chancediscovery
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Taxanomic websites IPNI,plant list,tropicos
Taxanomic websites IPNI,plant list,tropicosTaxanomic websites IPNI,plant list,tropicos
Taxanomic websites IPNI,plant list,tropicos
 
creation of DNA barcoding database with website
creation of DNA barcoding database with websitecreation of DNA barcoding database with website
creation of DNA barcoding database with website
 
ENVS 604 Fall 2012
ENVS 604 Fall 2012ENVS 604 Fall 2012
ENVS 604 Fall 2012
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio  Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFabricio  Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 
Plant names: Obstacles and Solutions to access information about plants
Plant names: Obstacles and Solutions to access information about plantsPlant names: Obstacles and Solutions to access information about plants
Plant names: Obstacles and Solutions to access information about plants
 

Destaque (7)

Amy Driskell - The Barcoding pipeline
Amy Driskell - The Barcoding pipelineAmy Driskell - The Barcoding pipeline
Amy Driskell - The Barcoding pipeline
 
Julie Stahlhut - Terrestrial invertebrates
Julie Stahlhut - Terrestrial invertebrates Julie Stahlhut - Terrestrial invertebrates
Julie Stahlhut - Terrestrial invertebrates
 
Steven Stones-Havas - Geneious: Biocode and LIMS
Steven Stones-Havas - Geneious: Biocode and LIMSSteven Stones-Havas - Geneious: Biocode and LIMS
Steven Stones-Havas - Geneious: Biocode and LIMS
 
Sarah Adamowicz - Pros and Cons of Collecting Specimens for Barcoding vs. Sam...
Sarah Adamowicz - Pros and Cons of Collecting Specimens for Barcoding vs. Sam...Sarah Adamowicz - Pros and Cons of Collecting Specimens for Barcoding vs. Sam...
Sarah Adamowicz - Pros and Cons of Collecting Specimens for Barcoding vs. Sam...
 
Dr Robert Hanner & Dr Dirk Steinke - Campaigns on BOLD
Dr Robert Hanner & Dr Dirk Steinke - Campaigns on BOLDDr Robert Hanner & Dr Dirk Steinke - Campaigns on BOLD
Dr Robert Hanner & Dr Dirk Steinke - Campaigns on BOLD
 
Navigating the Benefits Maze & Exercising Your Rights
Navigating the Benefits Maze & Exercising Your RightsNavigating the Benefits Maze & Exercising Your Rights
Navigating the Benefits Maze & Exercising Your Rights
 
Dirk Steinke - Marine invertebrates
Dirk Steinke - Marine invertebratesDirk Steinke - Marine invertebrates
Dirk Steinke - Marine invertebrates
 

Semelhante a Dr David Schindel and Mike Trizna - BOL Data Portal

Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
ICZN
 

Semelhante a Dr David Schindel and Mike Trizna - BOL Data Portal (20)

Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
Dr Robert Hanner - Barcode Data standards for animals, plants & fungiDr Robert Hanner - Barcode Data standards for animals, plants & fungi
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 
Selected innovations in Biodiversity Informatics
Selected innovations inBiodiversity InformaticsSelected innovations inBiodiversity Informatics
Selected innovations in Biodiversity Informatics
 
Texas sla presentation finding sci tech grey literature information
Texas sla presentation  finding sci tech grey literature informationTexas sla presentation  finding sci tech grey literature information
Texas sla presentation finding sci tech grey literature information
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Great Science, Technology, Engineering and Medicine Resources Web Search Univ...
Great Science, Technology, Engineering and Medicine Resources Web Search Univ...Great Science, Technology, Engineering and Medicine Resources Web Search Univ...
Great Science, Technology, Engineering and Medicine Resources Web Search Univ...
 

Mais de Consortium for the Barcode of Life (CBOL)

Mais de Consortium for the Barcode of Life (CBOL) (20)

Andrew Lowe - Opening Plenary
Andrew Lowe - Opening PlenaryAndrew Lowe - Opening Plenary
Andrew Lowe - Opening Plenary
 
Axel Hausmann - Invertebrates Plenary
Axel Hausmann - Invertebrates PlenaryAxel Hausmann - Invertebrates Plenary
Axel Hausmann - Invertebrates Plenary
 
Hannah McPherson - Plants Plenary
Hannah McPherson - Plants PlenaryHannah McPherson - Plants Plenary
Hannah McPherson - Plants Plenary
 
Rebecca Johnson - Opening Plenary
Rebecca Johnson - Opening PlenaryRebecca Johnson - Opening Plenary
Rebecca Johnson - Opening Plenary
 
K.A. Seifert - Algae, Protists & Fungi Plenary
K.A. Seifert - Algae, Protists & Fungi PlenaryK.A. Seifert - Algae, Protists & Fungi Plenary
K.A. Seifert - Algae, Protists & Fungi Plenary
 
Scott Miller - Opening Plenary
Scott Miller - Opening PlenaryScott Miller - Opening Plenary
Scott Miller - Opening Plenary
 
Bruce Deagle - Opening Plenary
Bruce Deagle - Opening PlenaryBruce Deagle - Opening Plenary
Bruce Deagle - Opening Plenary
 
Ralph Imondi - Opening Plenary
Ralph Imondi - Opening PlenaryRalph Imondi - Opening Plenary
Ralph Imondi - Opening Plenary
 
Damon Little - Opening Plenary
Damon Little - Opening PlenaryDamon Little - Opening Plenary
Damon Little - Opening Plenary
 
Natasha de Vere - Plants Plenary
Natasha de Vere - Plants PlenaryNatasha de Vere - Plants Plenary
Natasha de Vere - Plants Plenary
 
Robert Hanner - Closing Plenary
Robert Hanner - Closing PlenaryRobert Hanner - Closing Plenary
Robert Hanner - Closing Plenary
 
Paul Hebert - Saturday Closing Plenary
Paul Hebert - Saturday Closing PlenaryPaul Hebert - Saturday Closing Plenary
Paul Hebert - Saturday Closing Plenary
 
Conrad Schoch - Saturday Closing Plenary
Conrad Schoch - Saturday Closing PlenaryConrad Schoch - Saturday Closing Plenary
Conrad Schoch - Saturday Closing Plenary
 
Xin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing PlenaryXin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing Plenary
 
Pierre Taberlet - Saturday Closing Plenary
Pierre Taberlet - Saturday Closing PlenaryPierre Taberlet - Saturday Closing Plenary
Pierre Taberlet - Saturday Closing Plenary
 
Stoeckle - All Birds Barcoding Initiative
Stoeckle - All Birds Barcoding Initiative Stoeckle - All Birds Barcoding Initiative
Stoeckle - All Birds Barcoding Initiative
 
Weiland Meyer - Algae, Protists & Fungi Plenary
Weiland Meyer - Algae, Protists & Fungi PlenaryWeiland Meyer - Algae, Protists & Fungi Plenary
Weiland Meyer - Algae, Protists & Fungi Plenary
 
Alain Franc - Algae, Protists & Fungi Plenary
Alain Franc - Algae, Protists & Fungi PlenaryAlain Franc - Algae, Protists & Fungi Plenary
Alain Franc - Algae, Protists & Fungi Plenary
 
Marieka Gryzenhout - Algae, Protists & Fungi Plenary
Marieka Gryzenhout - Algae, Protists & Fungi PlenaryMarieka Gryzenhout - Algae, Protists & Fungi Plenary
Marieka Gryzenhout - Algae, Protists & Fungi Plenary
 
John La Salle - Opening Plenary
John La Salle - Opening PlenaryJohn La Salle - Opening Plenary
John La Salle - Opening Plenary
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Último (20)

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 

Dr David Schindel and Mike Trizna - BOL Data Portal

  • 1. The Barcode of Life Data Portal (http://bol.uvm.edu) Dr. David E Schindel, Executive Secretary Michael Trizna, Database Specialist Consortium for the Barcode of Life (CBOL) Smithsonian Institution Washington, DC www.barcodeoflife.org; SchindelD@si.edu and TriznaM@si.edu
  • 2. Contents of Presentation Crowd-sourced open source software How does Data Portal complement BOLD and GenBank? Data Portal capabilities Case Study: Smithsonian frozen bird tissue project
  • 3. An Experiment in Museum Tissue Mining and Fast Data Release Tissue sampling winter/spring Sequencing completed in September Sequence quality control in October Taxonomic checking in early November – Obvious errors removed – Minor discrepancies remain Data released for Adelaide Conference – Crowd-sourced annotation by community – Will data be mis-used?
  • 4. Unique Data Portal Capabilities Creating customized datasets from public and/or your private data Online library of standard datasets Support sharing within project teams using Connect IDs, easy link to Working Groups Running different identification analyses based on different methodologies: – Standard sequence input using FASTA format – Use standard or customized datasets
  • 5. Barcode Aggregator 727,170 public records
  • 8. Existing Data Analysis Packages LIST of packages – BLOG – BRONX – Kernel – CAOS – USEARCH – BLAST Output of identification routines as probabilities of assignment
  • 9. Data Analysis Methods Session New packages presented Friday afternoon: – Damon Little: Automatic Plants Barcode pipeline (from raw traces to trimmed/edited sequences) – Ka Hou Chu: Composite Vector Method (profile trees for faster alignment and tree- based analysis) – Alain Franc: Matching Next Generation results to Sanger-based reference records
  • 10.
  • 12. CONNECT for Data Portal Collaboration
  • 13.
  • 14. The USNM Bird Project USNM Division of Birds frozen tissue collection: – 21,104 specimens, 2512 species Which new ones ones to sample/barcode? Public records for birds – All public bird COI records: 10,967 – All BARCODE records in GenBank: 8,419 – BARCODE with taxonomic names: 7,965 – BARCODE, name and 2 traces: 2,388
  • 15. Moving Data Among BOLD, GenBank, Data Portal USNM Excel BOLD Spreadsheet Split into projects that (KE-Emu Source) consist of 2-4 plates Local database that Data Portal holds all fields from Aggregator the original database spreadsheet
  • 16. Creating a ‘Pick List’ Spreadsheet of tissue samples compared with: – ITIS taxonomy – Clemens species list in BOLD – Counts of GenBank and/or public BOLD records – Geographic informattion Screenshot of USNM list side-by-side with BOLD records
  • 17. Identifying Samples to be Subsampled
  • 19. USNM Bird Dataset 3150 tissues sampled 168 failed sequences 94 problematic sequences 166 clustered badly 2761 ‘BARCODE-ready’ samples 1,147 ‘first-BARCODE’ species 91% increase over 1,259 barcoded species (3,892 listed in BOLD includes BINs, others)
  • 20. Two problematic clades, USNM data Flycatchers: Family Tyrannidae – Sublegatus arenarum, S. modestus, S. obscurior, S. sp. – Conopias parvus, C. albovittatus – Myiarchus ferox, M. swainsoni, M. sp. Hummingbirds: Family Trochilidae – Phaethornis longuemareus Inconsistencies within USNM dataset Incompatibilities with public, other data
  • 21.
  • 23. What testing dataset to use? ID trees and analytical routines could use: – All public bird COI records: 10,967 – All BARCODE records in GenBank: 8,419 – BARCODE with taxonomic names: 7,965 – BARCODE, name and 2 traces: 2,388 Which ones have reliable taxonomic IDs?
  • 24. Preparing a Data Release Paper Summary statistics from Data Portal Figures from BOLD