SlideShare uma empresa Scribd logo
1 de 24
iPlant's Taxonomic Name Resolution
               Service

            Naim Matasci
    BIO5 / The iPlant Collaborative

           tnrs.iplantc.org
What is iPlant?
Empowering a New Plant Biology
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
TMU* Growth of Biological Collections
                              (1600 – 2012)
            600,000,000




            500,000,000




            400,000,000
Specimens




            300,000,000




            200,000,000




            100,000,000




                     0
                          1600 1620 1640 1660 1680 1700 1720 1740 1760 1780 1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000 2020




                   *TMU: Totally Made Up
If you can't find it, it doesn't exist
Data Reuse

• What's the correlation between leaf
  morphology and leaf economy (R. Walls)?
• Evolution of pit domatia (M. Donoghue)
iPlant Data Store

• Based on iRODS
  – Metadata driven
  – Storing, Sharing and Distributing
• Redundant (mirrors at TACC and UoA)
• Really, really, really big (6 PB + 40 PB LTS)
• Really, really, really fast
iPlant Data Store Performance
                                    UC Berkeley to iDS
                               100GB: 29m15s
                            1 GB / 17.5 seconds
      Source                 Destination              Copy Method               Time (seconds)
      CD                     Desktop PC               cp                        320
      Berkeley Server        Desktop PC               scp                       150
      External Drive         Desktop PC               cp                        36
      USB 2.0 Flash          Desktop PC               cp                        30
      iDS                    Desktop PC               iget                      18
      Desktop PC             Desktop PC               cp                        15

Desktop PC (UA): Mac with 7.2K Internal Hard Drive
External Drive: USB 2.0: 5.4k Hard Drive
Flash Drive: USB 2.0 Patriot XT

    https://pods.iplantcollaborative.org/wiki/display/start/How+fast+is+the+iPlant+Data+Store
PhytoBisque features
• Rich internet application (completely web based)
• Draws upon features from popular large scale photo
  sharing sites and high resolution aerial imagery (google
  maps)
• Ability to import and export over 100+ image formats,
  movies
• Ability to import extremely large image sets using iPlant
  data store
• Can display 20Kx20K image using standard web browser
• Manage data sets with tags, metadata management
• Utilizes distributed computing (connected to iPlant
  execute environment)
Taxonomic uncertainty

1. Non-existent names
  •   Misspellings
  •   Contamination
      •   Annotations
      •   Morphospecies
      •   Digitization issues (frame shifts, character
          encoding)Lexical variants (digitization conventions)
2. Synonymy
  •   Nomenclatural synonyms
  •   Taxonomic synonyms / concepts
3. Misidentifications, incomplete identifications
Non-existent names:
                    Herbarium specimens

Total specimens:                                                         1.1 million

Unique species names:                                                       53,052

Published names (legitimate & illegitimate):                                44,532

Misspelled names:                                                    9371 (18%)

Specimens with misspelled names:                                   101,237 (9%)




*New World plant specimens, 34 herbaria, simple match against IPNI and
                                          TROPICOS, excluding authors
Taxonomic Name Resolution Service

• Computer assisted standardization of plant
  names
• Corrects spelling errors and alternative
  spellings to a standard list of names
• Convert out-of-date names to currently
  accepted names
Future

• More sources
  – Standard source import with DwC support
• Better performance
• TNRastic API
• Integration with Global Names components
• Web: http://tnrs.iplantc.org/
• Code:
  https://github.com/iPlantCollaborativeOpenS
  ource/TNRS
• API (provisional): http://goo.gl/XnUiH
• TNRastic API: http://goo.gl/Z7Fkc
Brad Boyle                                  Paul Morris (Harvard University)
Brian Enquist                               Alan Paton (Kew Royal Botanic Gardens
Juan Antonio Raygoza Garay                  and their International Plant Names Index)
Nicole Hopkins                              Tony Rees (Commonwealth Scientific and
Zhenyuan Lu                                 Industrial Research Organisation)
Martha Narro                                Michael Giddens (www.silverbiology.com)
Shannon Oliver                              Dmitry Mozzherin (Global Biodiversity
William Piel                                Information Facility)
Jill Yarmchuk                               David Remsen (Global Biodiversity
                                            Information Facility)
Bob Magill (Missouri Botanical Garden)      David Patterson (Encyclopedia of Life)
Chris Freeland (Missouri Botanical          Cam Webb (Harvard University)
Garden)
Chuck Miller (Missouri Botanical Garden)    Missouri Botanical Garden (Tropicos)
Peter Jorgensen (Missouri Botanical
Garden)                                     Funding provided by the National Science
Amy Zanne (University of Missouri, St.      Foundation Plant Cyberinfrastructure
Louis)                                      Program (grant #DBI-0735191).
Peter Stevens (Missouri Botanical Garden)
Jay Paige (Missouri Botanical Garden)
Bob Peet (University of North Carolina at
Chapel Hill)

Mais conteúdo relacionado

Mais procurados

10 years of global biodiversity databases: are we there yet?
10 years of global biodiversity databases: are we there yet?10 years of global biodiversity databases: are we there yet?
10 years of global biodiversity databases: are we there yet?Tony Rees
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responsesRoderic Page
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK Cyndy Parr
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarJenny Molloy
 
TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...Phoenix Bioinformatics
 
Towards inferring the history of life in the presence of lateral gene transfe...
Towards inferring the history of life in the presence of lateral gene transfe...Towards inferring the history of life in the presence of lateral gene transfe...
Towards inferring the history of life in the presence of lateral gene transfe...boussau
 
Models of gene duplication, transfer and loss to study genome evolution
Models of gene duplication, transfer and loss to study genome evolutionModels of gene duplication, transfer and loss to study genome evolution
Models of gene duplication, transfer and loss to study genome evolutionboussau
 
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...GigaScience, BGI Hong Kong
 
E-Utilities
E-UtilitiesE-Utilities
E-Utilitiesmkim8
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in UberonChris Mungall
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013millerjeremya
 
Introduction to Biotechnology
Introduction to BiotechnologyIntroduction to Biotechnology
Introduction to BiotechnologyDoug Jones
 

Mais procurados (20)

10 years of global biodiversity databases: are we there yet?
10 years of global biodiversity databases: are we there yet?10 years of global biodiversity databases: are we there yet?
10 years of global biodiversity databases: are we there yet?
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responses
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 
TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...
 
Towards inferring the history of life in the presence of lateral gene transfe...
Towards inferring the history of life in the presence of lateral gene transfe...Towards inferring the history of life in the presence of lateral gene transfe...
Towards inferring the history of life in the presence of lateral gene transfe...
 
Models of gene duplication, transfer and loss to study genome evolution
Models of gene duplication, transfer and loss to study genome evolutionModels of gene duplication, transfer and loss to study genome evolution
Models of gene duplication, transfer and loss to study genome evolution
 
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
Peter Li: GigaDB and Galaxy - revolutionizing data dissemination, organizatio...
 
Shorthouse
ShorthouseShorthouse
Shorthouse
 
E-Utilities
E-UtilitiesE-Utilities
E-Utilities
 
Representation of kidney structures in Uberon
Representation of kidney structures in UberonRepresentation of kidney structures in Uberon
Representation of kidney structures in Uberon
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Building Data
Building DataBuilding Data
Building Data
 
Zfin
ZfinZfin
Zfin
 
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
 
Introduction to Biotechnology
Introduction to BiotechnologyIntroduction to Biotechnology
Introduction to Biotechnology
 

Destaque

Trabajo de steve jobs
Trabajo de steve jobsTrabajo de steve jobs
Trabajo de steve jobssilviafercor
 
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life SciencesThe iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life SciencesNaim Matasci
 
Phylogenetic Workflows
Phylogenetic WorkflowsPhylogenetic Workflows
Phylogenetic WorkflowsNaim Matasci
 

Destaque (7)

Ab680000
Ab680000Ab680000
Ab680000
 
Trabajo de steve jobs
Trabajo de steve jobsTrabajo de steve jobs
Trabajo de steve jobs
 
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life SciencesThe iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
The iPlant Collaborative: A Cyberinfrastructure for the Life Sciences
 
Practica
PracticaPractica
Practica
 
Ab680000
Ab680000Ab680000
Ab680000
 
Liliana
LilianaLiliana
Liliana
 
Phylogenetic Workflows
Phylogenetic WorkflowsPhylogenetic Workflows
Phylogenetic Workflows
 

Semelhante a iPlant TNRS for digital collections - iDigBio Workshop

Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitNaim Matasci
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceGigaScience, BGI Hong Kong
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...GigaScience, BGI Hong Kong
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformaticsChris Dwan
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotesc.titus.brown
 
The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitNaim Matasci
 
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10PICNIC Festival
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960mare34
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...FOODCROPS
 
Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...Vince Smith
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingJeremy Yang
 
Introduction to EOL.org for scientists
Introduction to EOL.org for scientistsIntroduction to EOL.org for scientists
Introduction to EOL.org for scientistsCyndy Parr
 

Semelhante a iPlant TNRS for digital collections - iDigBio Workshop (20)

Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and Toolkit
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
Bioinformatics issues and challanges presentation at s p college
Bioinformatics  issues and challanges  presentation at s p collegeBioinformatics  issues and challanges  presentation at s p college
Bioinformatics issues and challanges presentation at s p college
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and Toolkit
 
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
 
Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in Biocomputing
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
Introduction to EOL.org for scientists
Introduction to EOL.org for scientistsIntroduction to EOL.org for scientists
Introduction to EOL.org for scientists
 

Mais de Naim Matasci

iPlant Taxonomic Name Resolution Service v. 3
iPlant Taxonomic Name Resolution Service v. 3iPlant Taxonomic Name Resolution Service v. 3
iPlant Taxonomic Name Resolution Service v. 3Naim Matasci
 
Phylotastic reconciliation
Phylotastic reconciliationPhylotastic reconciliation
Phylotastic reconciliationNaim Matasci
 
iPlant Tree of Life
iPlant Tree of LifeiPlant Tree of Life
iPlant Tree of LifeNaim Matasci
 
Post-tree Analyses Workflow
Post-tree Analyses WorkflowPost-tree Analyses Workflow
Post-tree Analyses WorkflowNaim Matasci
 
Phylogenetic Workflows
Phylogenetic WorkflowsPhylogenetic Workflows
Phylogenetic WorkflowsNaim Matasci
 
The TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsThe TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsNaim Matasci
 

Mais de Naim Matasci (7)

iPlant Taxonomic Name Resolution Service v. 3
iPlant Taxonomic Name Resolution Service v. 3iPlant Taxonomic Name Resolution Service v. 3
iPlant Taxonomic Name Resolution Service v. 3
 
iPlant TNRS
iPlant TNRSiPlant TNRS
iPlant TNRS
 
Phylotastic reconciliation
Phylotastic reconciliationPhylotastic reconciliation
Phylotastic reconciliation
 
iPlant Tree of Life
iPlant Tree of LifeiPlant Tree of Life
iPlant Tree of Life
 
Post-tree Analyses Workflow
Post-tree Analyses WorkflowPost-tree Analyses Workflow
Post-tree Analyses Workflow
 
Phylogenetic Workflows
Phylogenetic WorkflowsPhylogenetic Workflows
Phylogenetic Workflows
 
The TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsThe TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for Plants
 

Último

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Último (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

iPlant TNRS for digital collections - iDigBio Workshop

  • 1. iPlant's Taxonomic Name Resolution Service Naim Matasci BIO5 / The iPlant Collaborative tnrs.iplantc.org
  • 3.
  • 4.
  • 5. Empowering a New Plant Biology
  • 7. TMU* Growth of Biological Collections (1600 – 2012) 600,000,000 500,000,000 400,000,000 Specimens 300,000,000 200,000,000 100,000,000 0 1600 1620 1640 1660 1680 1700 1720 1740 1760 1780 1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000 2020 *TMU: Totally Made Up
  • 8. If you can't find it, it doesn't exist
  • 9.
  • 10. Data Reuse • What's the correlation between leaf morphology and leaf economy (R. Walls)? • Evolution of pit domatia (M. Donoghue)
  • 11. iPlant Data Store • Based on iRODS – Metadata driven – Storing, Sharing and Distributing • Redundant (mirrors at TACC and UoA) • Really, really, really big (6 PB + 40 PB LTS) • Really, really, really fast
  • 12. iPlant Data Store Performance UC Berkeley to iDS 100GB: 29m15s 1 GB / 17.5 seconds Source Destination Copy Method Time (seconds) CD Desktop PC cp 320 Berkeley Server Desktop PC scp 150 External Drive Desktop PC cp 36 USB 2.0 Flash Desktop PC cp 30 iDS Desktop PC iget 18 Desktop PC Desktop PC cp 15 Desktop PC (UA): Mac with 7.2K Internal Hard Drive External Drive: USB 2.0: 5.4k Hard Drive Flash Drive: USB 2.0 Patriot XT https://pods.iplantcollaborative.org/wiki/display/start/How+fast+is+the+iPlant+Data+Store
  • 13. PhytoBisque features • Rich internet application (completely web based) • Draws upon features from popular large scale photo sharing sites and high resolution aerial imagery (google maps) • Ability to import and export over 100+ image formats, movies • Ability to import extremely large image sets using iPlant data store • Can display 20Kx20K image using standard web browser • Manage data sets with tags, metadata management • Utilizes distributed computing (connected to iPlant execute environment)
  • 14. Taxonomic uncertainty 1. Non-existent names • Misspellings • Contamination • Annotations • Morphospecies • Digitization issues (frame shifts, character encoding)Lexical variants (digitization conventions) 2. Synonymy • Nomenclatural synonyms • Taxonomic synonyms / concepts 3. Misidentifications, incomplete identifications
  • 15. Non-existent names: Herbarium specimens Total specimens: 1.1 million Unique species names: 53,052 Published names (legitimate & illegitimate): 44,532 Misspelled names: 9371 (18%) Specimens with misspelled names: 101,237 (9%) *New World plant specimens, 34 herbaria, simple match against IPNI and TROPICOS, excluding authors
  • 16. Taxonomic Name Resolution Service • Computer assisted standardization of plant names • Corrects spelling errors and alternative spellings to a standard list of names • Convert out-of-date names to currently accepted names
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Future • More sources – Standard source import with DwC support • Better performance • TNRastic API • Integration with Global Names components
  • 23. • Web: http://tnrs.iplantc.org/ • Code: https://github.com/iPlantCollaborativeOpenS ource/TNRS • API (provisional): http://goo.gl/XnUiH • TNRastic API: http://goo.gl/Z7Fkc
  • 24. Brad Boyle Paul Morris (Harvard University) Brian Enquist Alan Paton (Kew Royal Botanic Gardens Juan Antonio Raygoza Garay and their International Plant Names Index) Nicole Hopkins Tony Rees (Commonwealth Scientific and Zhenyuan Lu Industrial Research Organisation) Martha Narro Michael Giddens (www.silverbiology.com) Shannon Oliver Dmitry Mozzherin (Global Biodiversity William Piel Information Facility) Jill Yarmchuk David Remsen (Global Biodiversity Information Facility) Bob Magill (Missouri Botanical Garden) David Patterson (Encyclopedia of Life) Chris Freeland (Missouri Botanical Cam Webb (Harvard University) Garden) Chuck Miller (Missouri Botanical Garden) Missouri Botanical Garden (Tropicos) Peter Jorgensen (Missouri Botanical Garden) Funding provided by the National Science Amy Zanne (University of Missouri, St. Foundation Plant Cyberinfrastructure Louis) Program (grant #DBI-0735191). Peter Stevens (Missouri Botanical Garden) Jay Paige (Missouri Botanical Garden) Bob Peet (University of North Carolina at Chapel Hill)

Notas do Editor

  1. Bringing a culture of computing to the Plant Sciences.