SlideShare uma empresa Scribd logo
1 de 15
Scaling-up  collections digitisation Vincent S. Smith Vladimir Blagoderov, Ian Kitching & Thomas Simonsen
“ the rate of progress by the UK taxonomic institutions in digitising and making collections information available is disappointingly low… there is a significant risk of damage to the international reputation of major institutions such as The Natural History Museum ” House of Lords Science and Technology Committee Report on Taxonomy and Systematics, 2009
Rate of digitisation at the NHM
Specimen focus
SatScan TM   (by SmartDrive)
 
Example outputs Diptera:  http://sciaroidea.info/node/44309 Coreidae:  http://sciaroidea.info/node/44310
Sackler Lab Trials Nine test projects over 1 month (ent. bot. & palaeoent.) - Assess utility for coll. management and research - Understand technical & practical limitations   ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sackler Lab Trials Aperture, Exposure, Depth of Field & Resolution 11 810 41 Exposure (ms) DoF (mm) 6 80 17 Smallest resolvable structure ( µ m)   56 98 59 Open Closed Midway Aperture
General points   Implications Entomology dept. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Caveats ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],NHM Issues ,[object Object],[object Object],[object Object],[object Object],Hardware / Software issues
Metadata capture is rate limiting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Possible Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Collection management Research Public engagement
Next Steps… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Larger Scale Project to address NHM Issues Acknowledgements ,[object Object],http://sciaroidea.info/sites/sciaroidea.info/files/SatScanTrialReport.pdf

Mais conteúdo relacionado

Mais procurados

D4science-II Codata
D4science-II CodataD4science-II Codata
D4science-II CodataFAO
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesManjulaPatel
 
E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3Alex Hardisty
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
 
The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...ManjulaPatel
 
AI & Bio Medical Presentation @JoshArnold et al
AI & Bio Medical Presentation @JoshArnold et alAI & Bio Medical Presentation @JoshArnold et al
AI & Bio Medical Presentation @JoshArnold et alClinton Arnold
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceAndrew Sallans
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridIan Foster
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarshiptsbbbu
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Deborah McGuinness
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Robert Grossman
 
BiSciCol ievobio
BiSciCol ievobioBiSciCol ievobio
BiSciCol ievobioJohn Deck
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloudstratuslab
 
Forschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
Forschungsdaten-Repositorien Typen, Herausforderungen und PerspektivenForschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
Forschungsdaten-Repositorien Typen, Herausforderungen und PerspektivenHeinz Pampel
 
The Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeThe Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeVince Smith
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013ECNOfficer
 
The Dutch Approach to Research Data Infrastructure
The Dutch Approach to Research Data InfrastructureThe Dutch Approach to Research Data Infrastructure
The Dutch Approach to Research Data Infrastructurepkdoorn
 

Mais procurados (20)

D4science-II Codata
D4science-II CodataD4science-II Codata
D4science-II Codata
 
Keller geo edu
Keller geo eduKeller geo edu
Keller geo edu
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
 
E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3E cconcertation lyon-22-sep2011-v3
E cconcertation lyon-22-sep2011-v3
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
 
The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...
 
AI & Bio Medical Presentation @JoshArnold et al
AI & Bio Medical Presentation @JoshArnold et alAI & Bio Medical Presentation @JoshArnold et al
AI & Bio Medical Presentation @JoshArnold et al
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 
BiSciCol ievobio
BiSciCol ievobioBiSciCol ievobio
BiSciCol ievobio
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
 
Forschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
Forschungsdaten-Repositorien Typen, Herausforderungen und PerspektivenForschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
Forschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
 
The Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeThe Biodiversity Informatics Landscape
The Biodiversity Informatics Landscape
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013
 
Towards Knowledge-Enabled Society
Towards Knowledge-Enabled SocietyTowards Knowledge-Enabled Society
Towards Knowledge-Enabled Society
 
The Dutch Approach to Research Data Infrastructure
The Dutch Approach to Research Data InfrastructureThe Dutch Approach to Research Data Infrastructure
The Dutch Approach to Research Data Infrastructure
 

Semelhante a Scaling-up collections digitisation

Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor NetworksOscar Corcho
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningbutest
 
The digital preservation technical context
The digital preservation technical contextThe digital preservation technical context
The digital preservation technical contextMichael Day
 
Print to Pixels: Digitizing in Your Library
Print to Pixels: Digitizing in Your LibraryPrint to Pixels: Digitizing in Your Library
Print to Pixels: Digitizing in Your LibraryMartin Kalfatovic
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰台灣資料科學年會
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeCarole Goble
 
Repositories and digital preservation
Repositories and digital preservationRepositories and digital preservation
Repositories and digital preservationMichael Day
 
Project CAiRO Overview
Project CAiRO OverviewProject CAiRO Overview
Project CAiRO OverviewStephen Gray
 
Cost, Risk, Loss and other fun things
Cost, Risk, Loss and other fun things Cost, Risk, Loss and other fun things
Cost, Risk, Loss and other fun things PrestoCentre
 
Digitisation Workshop Pres 2008(V1)
Digitisation Workshop Pres 2008(V1)Digitisation Workshop Pres 2008(V1)
Digitisation Workshop Pres 2008(V1)Mal Booth
 
Brief Introduction to Digital Preservation
Brief Introduction to Digital PreservationBrief Introduction to Digital Preservation
Brief Introduction to Digital PreservationMichael Day
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - IntroductionTutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - IntroductionJean-Paul Calbimonte
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Joel Saltz
 
Ben Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of DiscoveryBen Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of Discoveryruss9595
 
Autonomous Learning for Autonomous Systems, by Prof. Plamen Angelov
Autonomous Learning for Autonomous Systems, by Prof. Plamen AngelovAutonomous Learning for Autonomous Systems, by Prof. Plamen Angelov
Autonomous Learning for Autonomous Systems, by Prof. Plamen AngelovJenny Midwinter
 
Integrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudIntegrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudData Finder
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 

Semelhante a Scaling-up collections digitisation (20)

Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
The digital preservation technical context
The digital preservation technical contextThe digital preservation technical context
The digital preservation technical context
 
Print to Pixels: Digitizing in Your Library
Print to Pixels: Digitizing in Your LibraryPrint to Pixels: Digitizing in Your Library
Print to Pixels: Digitizing in Your Library
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
 
Repositories and digital preservation
Repositories and digital preservationRepositories and digital preservation
Repositories and digital preservation
 
Cairo
CairoCairo
Cairo
 
Project CAiRO Overview
Project CAiRO OverviewProject CAiRO Overview
Project CAiRO Overview
 
Cost, Risk, Loss and other fun things
Cost, Risk, Loss and other fun things Cost, Risk, Loss and other fun things
Cost, Risk, Loss and other fun things
 
Digitisation Workshop Pres 2008(V1)
Digitisation Workshop Pres 2008(V1)Digitisation Workshop Pres 2008(V1)
Digitisation Workshop Pres 2008(V1)
 
Brief Introduction to Digital Preservation
Brief Introduction to Digital PreservationBrief Introduction to Digital Preservation
Brief Introduction to Digital Preservation
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - IntroductionTutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
Tutorial ESWC2011 Building Semantic Sensor Web - 01 - Introduction
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
 
Ben Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of DiscoveryBen Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of Discovery
 
Autonomous Learning for Autonomous Systems, by Prof. Plamen Angelov
Autonomous Learning for Autonomous Systems, by Prof. Plamen AngelovAutonomous Learning for Autonomous Systems, by Prof. Plamen Angelov
Autonomous Learning for Autonomous Systems, by Prof. Plamen Angelov
 
Integrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloudIntegrating scientific laboratories into the cloud
Integrating scientific laboratories into the cloud
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 

Mais de Vince Smith

DiSSCo institutional benefits
DiSSCo institutional benefitsDiSSCo institutional benefits
DiSSCo institutional benefitsVince Smith
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeVince Smith
 
FP7 Funded RI Project experiences: some overly honest tips from a project coo...
FP7 Funded RI Project experiences: some overly honest tips from a project coo...FP7 Funded RI Project experiences: some overly honest tips from a project coo...
FP7 Funded RI Project experiences: some overly honest tips from a project coo...Vince Smith
 
Use it or lose it: a hybrid model for sustaining e-infrastructures
Use it or lose it: a hybrid model for sustaining e-infrastructuresUse it or lose it: a hybrid model for sustaining e-infrastructures
Use it or lose it: a hybrid model for sustaining e-infrastructuresVince Smith
 
SYNTHESYS 3 Overview
SYNTHESYS 3 OverviewSYNTHESYS 3 Overview
SYNTHESYS 3 OverviewVince Smith
 
Scratchpad 2014-introduction
Scratchpad 2014-introductionScratchpad 2014-introduction
Scratchpad 2014-introductionVince Smith
 
Consolidated ViBRANT Project Final Review Presentations
Consolidated ViBRANT Project Final Review PresentationsConsolidated ViBRANT Project Final Review Presentations
Consolidated ViBRANT Project Final Review PresentationsVince Smith
 
Assisted restructure of web content for paper-based presentation: a look at w...
Assisted restructure of web content for paper-based presentation: a look at w...Assisted restructure of web content for paper-based presentation: a look at w...
Assisted restructure of web content for paper-based presentation: a look at w...Vince Smith
 
Bibliography of Life: Comprehensive services for biodiversity bibliographic r...
Bibliography of Life: Comprehensive services for biodiversity bibliographic r...Bibliography of Life: Comprehensive services for biodiversity bibliographic r...
Bibliography of Life: Comprehensive services for biodiversity bibliographic r...Vince Smith
 
Scratchpads: the Virtual Research Environment for biodiversity data
Scratchpads: the Virtual Research Environment for biodiversity dataScratchpads: the Virtual Research Environment for biodiversity data
Scratchpads: the Virtual Research Environment for biodiversity dataVince Smith
 
Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...Vince Smith
 
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...Vince Smith
 
Vince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince Smith
 
The biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspectiveThe biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspectiveVince Smith
 
Don't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easyDon't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easyVince Smith
 
Delivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information ageDelivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information ageVince Smith
 
Don’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easyDon’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easyVince Smith
 
Virtual Research Environments supporting biodiversity research: Needs & prior...
Virtual Research Environments supporting biodiversity research: Needs & prior...Virtual Research Environments supporting biodiversity research: Needs & prior...
Virtual Research Environments supporting biodiversity research: Needs & prior...Vince Smith
 
2013 02 data portal science group update -v smith
2013 02 data portal science group update -v smith2013 02 data portal science group update -v smith
2013 02 data portal science group update -v smithVince Smith
 
Sharing, linking and publishing biodiversity data the ViBRANT way
Sharing, linking and publishing biodiversity data the ViBRANT waySharing, linking and publishing biodiversity data the ViBRANT way
Sharing, linking and publishing biodiversity data the ViBRANT wayVince Smith
 

Mais de Vince Smith (20)

DiSSCo institutional benefits
DiSSCo institutional benefitsDiSSCo institutional benefits
DiSSCo institutional benefits
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
FP7 Funded RI Project experiences: some overly honest tips from a project coo...
FP7 Funded RI Project experiences: some overly honest tips from a project coo...FP7 Funded RI Project experiences: some overly honest tips from a project coo...
FP7 Funded RI Project experiences: some overly honest tips from a project coo...
 
Use it or lose it: a hybrid model for sustaining e-infrastructures
Use it or lose it: a hybrid model for sustaining e-infrastructuresUse it or lose it: a hybrid model for sustaining e-infrastructures
Use it or lose it: a hybrid model for sustaining e-infrastructures
 
SYNTHESYS 3 Overview
SYNTHESYS 3 OverviewSYNTHESYS 3 Overview
SYNTHESYS 3 Overview
 
Scratchpad 2014-introduction
Scratchpad 2014-introductionScratchpad 2014-introduction
Scratchpad 2014-introduction
 
Consolidated ViBRANT Project Final Review Presentations
Consolidated ViBRANT Project Final Review PresentationsConsolidated ViBRANT Project Final Review Presentations
Consolidated ViBRANT Project Final Review Presentations
 
Assisted restructure of web content for paper-based presentation: a look at w...
Assisted restructure of web content for paper-based presentation: a look at w...Assisted restructure of web content for paper-based presentation: a look at w...
Assisted restructure of web content for paper-based presentation: a look at w...
 
Bibliography of Life: Comprehensive services for biodiversity bibliographic r...
Bibliography of Life: Comprehensive services for biodiversity bibliographic r...Bibliography of Life: Comprehensive services for biodiversity bibliographic r...
Bibliography of Life: Comprehensive services for biodiversity bibliographic r...
 
Scratchpads: the Virtual Research Environment for biodiversity data
Scratchpads: the Virtual Research Environment for biodiversity dataScratchpads: the Virtual Research Environment for biodiversity data
Scratchpads: the Virtual Research Environment for biodiversity data
 
Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...Next generation sequencing requires next generation publishing: the Biodivers...
Next generation sequencing requires next generation publishing: the Biodivers...
 
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
 
Vince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notext
 
The biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspectiveThe biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspective
 
Don't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easyDon't make me think: biodiversity data publishing made easy
Don't make me think: biodiversity data publishing made easy
 
Delivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information ageDelivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information age
 
Don’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easyDon’t make me think: biodiversity data publishing made easy
Don’t make me think: biodiversity data publishing made easy
 
Virtual Research Environments supporting biodiversity research: Needs & prior...
Virtual Research Environments supporting biodiversity research: Needs & prior...Virtual Research Environments supporting biodiversity research: Needs & prior...
Virtual Research Environments supporting biodiversity research: Needs & prior...
 
2013 02 data portal science group update -v smith
2013 02 data portal science group update -v smith2013 02 data portal science group update -v smith
2013 02 data portal science group update -v smith
 
Sharing, linking and publishing biodiversity data the ViBRANT way
Sharing, linking and publishing biodiversity data the ViBRANT waySharing, linking and publishing biodiversity data the ViBRANT way
Sharing, linking and publishing biodiversity data the ViBRANT way
 

Scaling-up collections digitisation

  • 1. Scaling-up collections digitisation Vincent S. Smith Vladimir Blagoderov, Ian Kitching & Thomas Simonsen
  • 2. “ the rate of progress by the UK taxonomic institutions in digitising and making collections information available is disappointingly low… there is a significant risk of damage to the international reputation of major institutions such as The Natural History Museum ” House of Lords Science and Technology Committee Report on Taxonomy and Systematics, 2009
  • 3. Rate of digitisation at the NHM
  • 5. SatScan TM (by SmartDrive)
  • 6.  
  • 7. Example outputs Diptera: http://sciaroidea.info/node/44309 Coreidae: http://sciaroidea.info/node/44310
  • 8.
  • 9. Sackler Lab Trials Aperture, Exposure, Depth of Field & Resolution 11 810 41 Exposure (ms) DoF (mm) 6 80 17 Smallest resolvable structure ( µ m) 56 98 59 Open Closed Midway Aperture
  • 10.
  • 11.
  • 12.
  • 13.  
  • 14.
  • 15.

Notas do Editor

  1. Good morning everybody. I want to switch gear slightly and tell you about some work I have been doing with my colleagues in Entomology to speed up the rate of digitization. With the exception of the BHL project what we have heard about are mostly small-scale projects looking at the digitising pockets of the NHM collection. These are mostly project driven efforts, digitising on average few thousand specimens. I’d argue that although these projects are useful, especially for people wanting these data, we need to take a more industrial approach to the problem of digitisation. And I come to this conclusion based on two observations.
  2. The first is based on what last years HoL Science and Technology Committee said about digitisation in their report on the state of taxonomy and systematics. They said, and this is a direct quote “Th e rate of progress by the UK taxonomic institutions in digitising and making collections information available is disappointingly low … there is a significant risk of damage to the international reputation of major institutions such as The Natural History Museum ”
  3. My second observation is that the HoL were absolutely right one both counts. Last year Graham Higley put together a cross departmental group to look at digitisation efforts across the NHM. They got together data from all the departments and looked at rate at which meta data was being digitised (so that’s things like collecting data of specimens labels), and the rate at which specimens were being digitised (in other words imaged) in various ways. *** From this they calculated that at present rates it would take 900 years to get the data off the collection, and 500 years to take the pictures. Now I don’t know about you by if you believe that mass digitisation efforts are useful, and for reasons I’ll come to I’m one of those people that do, then I’m not prepared to wait that long. More to the point I’m pretty certain our funders won’t and even more importantly the people that might make use of this information (if they knew it was there) won’t wait either.
  4. Perhaps one of the reasons why we are rather constrained in out thinking about digitisation is because of our natural focus on specimens. The shear magnitude and effort of individually handling the 70 million plus specimens in order to digitise then is enough to put anyone off, especially when we have some many other priorities, and when we are not entirely clear why we’d undertake this in the first place. ***But the truth is that most of out specimens in grouped in such a way that makes then much easier to handle and in such a way that they are on display. For example, in entomology, although we have 28 million specimens, most of them are held in draws, and we only have 135,000 of them. ***If there was a way in which we could digitised these draws, ***and if we can get sufficient information no only to see the specimens, but also perhaps to get taxonomic data from specimens, then perhaps the task of digitising the collection wouldn’t seem so great.
  5. What I want to tell you about is a piece of prototype equipment we have been testing in the Sackler Image Lab for the past couple of months that will do just this. It is produced by a company called SmartDrive based in Cambridge and is a combination of hardware and software that provides automated capture of lower resolution images, which are then assembled ( st itched) into a larger panoramic image, generating an extremely high resolution final image. A telecentic camera with the attached lens is moved in two dimensions along precision rails positioned above the imaged object. This method maximizes depth of field of the captured images and minimizes distortion and parallax artifacts. The best way of understanding this is not by me explaining it but by you seeing this in action, and I have a short movie that demonstrates this.
  6. This is the equipment based in the Sackler lab, and what Natalie is doing is placing a specimen draw in the machine. These are some swallowtail butterflies I think. She then sets the machine off from its starting positing and it begins capturing images. What’s happening under the hood is that the camera and lens are moved along precision rails at the top, and at each point they capture an image. Each of the original images is 1280 x 960 pixels. The images are tiled together on the computer and that is what you can see on the screen. It takes about 5 minutes to do a typical sized entomology draw, although for some of the larger draws it can take up to 7 minutes. Each of images are then stitched together by the computer giving a final images of up to 21000 by 21000 pixels. That’s roughly 35 pixels per mm.
  7. Here are some example outputs from the machine. This close up here corresponds to the tiny white patch on the wing. In actual fact the area is smaller that the white patch but for some reason I couldn’t get PowerPoint to make a rectangle small enough. Just to be clear, the structures that you can see in this close up are not pixels - these are individual scales on the butterfly wing. Of course butterfly’s are quite large so let me show you dome smaller objects. This is a draw of fungus gnats. We have just mounted these images on the web using the Zoomify plug-in that allows you to zoom in to very large images. As you can see, the images retain taxonomic information at the maximum zoom level and are still not pixelated. As a second example here is a draw of leaf footed bugs (Squash bugs). Again the images retain a high level of resolution and depth f field. In otherwise the specimens and images are in focus.
  8. SatScan have leant us the equipment for about a month to run some trials that we have been running of various entomological, botanical and palaeoentomological parts of the collection. The goal of these trials was to assess utility for collection management and research, and to work with the company to understand technical & practical limitations of the machine, and look at options for how it could be improved. In this short trail we managed to digitise about 500 draws. Key facts from this work are that the minimum resolvable structure depending on the precise aperture and exposure settings is 0.06-0.1mm. Just to put that into perspective this means that about 65% of the specimens in the entomology collection here could be usefully digitised at this resolution. The system (again depending on the exact aperture and exposure settings) gives a very high depth of fields. In other words objects like the specimen and the label (when it is not obscured) all stay in focus if they are between 10 and 80mm. The file sizes are actually relatively small when you consider the size of the draw. They are about 300-500 MB as a compressed TIFF image, which sounds a lot but really isn’t too bad. Scanning time for a typical draw is 5-7 minutes. This means that a single operator could do about 60 draws a day. In addition you have to take into account the stitching time. This can take 5-10 minutes depending on the size of the draw, meaning you can stitch images for about 90 draws in a 12-hour period. However this whole process can be batched so it runs overnight – there is no need to be present while it is happening.
  9. This slide gives an overview of the relationship between the size of the aperture and the exposure time, which affect the depth of field and the size of the smallest resolvable structure. TO save time I’m not going to go into detail here, suffice to ay that with this camera combination you can resolve structures downs to about 56 microns (that 56 thousandths of a millimetre), but you only get a depth of field of 6mm. [Any photographer will know that there is a trade off with the size of the aperture on the lens and the length of exposure on the camera, which affect the depth of field and the level of resolution. The wider the aperture the shorter the exposure. This narrows the depth of field but increases the resolution. In this case the smallest resolvable structure we could resolve was 56 microns, but this just gives us a dept of field of 6 mm. At the other end if you close the aperture down but have a longer exposure we could achieve 8 cm of depth of field but resolve less on those specimens. The implications of this is that if you have a try of very small insects and you want to achieve a high resolution, they all need to be within the depth of field of 6mm. Conversely if you had lager insects in the tray and could tolerate a lower level of resolution, the tolerance on the depth of field is much higher (8cm). We used a Basler 1/2" CCD chip camera and an Edmund Optics 0.16x telecentric lens.]
  10. What are the implications for all this. Well at a general level, the systems is best suited to drawers of numerous, uniformly positioned, medium sized specimens. For example, it gives excellent results with large and medium-size beetles, moths and butterflies. At this level of resolution sufficient information is usually preserved to allow identification, oftn to species level, for these specimens. Objects less than 10 mm could not be imaged so adequately, although such images could be used in other ways, and I’ll come on to possible uses in a moment. Another key point is that specimen labels and barcodes (when not obscured by the specimens) could be easily read from the digitised image. Witin entomology this more specifically means that of the 135,000 draws in the department., 85,000 could be usefully imaged at the current level of resolution with this system. This work could be completed in ~2024 person-days (ten person-years) using one system. Its worth noting that other lens / camera options might be explored to image remaining draws at a higher level of resolution.
  11. It won’t have escaped your attention that there are some downside with this approach. In fact I think there are three major issues we’d need to consider when evaluating it utility in the NHM. The first issues is one about metadata. This is such a big issues that I’ll consider it in a separate slide. The next major issues is the utility of surface (usually dorsal) view images - not a panacea. There are plenty of parts of our collection where have surface vies of specimens simply isn’t that useful. For example many mineralogical specimens or palaeontological specimens have most of their information locked away inside them. Of course one might make the same point about the many other kinds of data (molecular, X-ray, chemical data), which is simply not accessible, though images. The third major issue is that to make this process useful we would need to assigning specimen level identifiers to the objects we image. These can be physical labels, like barcodes, electronic labels actually on the images – and possible both – I’ll cover this on the next slide. Another consideration is the space required to store all these images. If we are going to store 85k stitched images that equals about 28,222 GB or 27.6TB, which sounds a lot, but in this day and age really isn’t that much, especially when you consider the effort it represents. To make this system useful we need to make sure we develop the software to manage the workflow of processing the images. Likewise we have to integrate this with our existing systems like KeEMu and DAMS system that Ailsa will talk about. Finally, and perhaps most importantly, if we are to embrace this process as part of our work, I has implications for the way we use the collection for research & collection management processes, particularly in terms of things like staff time and general curation activities. Another point, although it is actually pretty trivial when you consider the size of the other points is the cost. This is circa 」 5 0k (for outright purchase) or 」 2 k per month hire. There are afew outstanding issues to do with the hardware and software of the system. Max. scanning area ~ 500 x 600 mm – insufficient for some drawers; occasional errors during scanning and stitching; focusing (currently time consuming); inconvenient access to scanning area.
  12. I want to go back to this issue of metadata capture since this is the point that is perhaps most controversial about this approach. My first point is that metadata capture is the rate-limiting step. If you remember on the second slide I showed, we established that at present rates it will take about 900 years to capture all the metadata from NHM specimens at current rates. This machine doesn’t directly changes this fact. However I do want to make a few points about metadata capture that are important here. Firstly specimen images & metadata need not be captured together. But if you don’t do it at the same time (and arguably even if you do) at the very least you need a way of linking them back together at a later date, and you do this linking though hared identifiers. In other words having the same number of the specimen and the image so you can link the two back together. These specimen level identifiers might be physical, virtual or both. Assignment of virtual identifiers might be automated (though this requires some investigation). More likely we would prioritise metadata capture based on research & collection activities. Images are easy to get with this system and we can image and re-image as required. We might think about more innovative ways of capturing the metadata and assignment of identifiers and image cropping – for example through crowd sourcing the problem – though I don’t have the time to go into this here.
  13. In fact the seperation of captureing the metaddata, assigning identifiers and imaging the specimens was exactly what we berved when we opend the system up for others to use. This is a volunteer workingon the British Lichins collection and she is adding barcodes to the specimens, noting metadate about the draw (not the individual specimens) and then imaging the lot - all very quickly.
  14. So what are our next steps with this system? Well out main goal is to set up a larger scale project to address the NHM issues we might have about using it, and I have just repeated the key issues here. At that point I’ll stop but before I do I want to acknowledge the help of Smart drive Ltd (especially Mike Broderick & Dennis Murphy), and particular for their free loan of the system while we explore how we can make use of it. Without their help and their innovative work on the system, none of this would be possible. Thanks very much.