SlideShare a Scribd company logo
1 of 27
Public archiving of bio-imaging data –
perspectives, challenges and outlook
Ardan Patwardhan
Outline
• Introduction
• EMDB and EMPIAR status
• Resources for EMDB and EMPIAR
• On-going projects, initiatives and plans
Introduction
Molecular and Cellular Structure
• Maintain and manage archives
• PDB for atomic coordinate
models
• EMDB for 3DEM
reconstructions
• EMPIAR for 3DEM raw data
• Develop and maintain web-
services – searching, visualisation
and validation
• Facilitate community-wide
initiatives
• Key themes – integration with
other bioinformatics resources and
imaging scales and validation
Structural data archives
Archive Type of
data
Founded Organization Funding # people # entries Size
PDB Atomic
coordinate
models
structures
1971 wwpdb (EBI,
RCSB, PDBj,
BMRB)
Core +
grants
60-80 124286 1 TB
(8 MB)
EMDB 3DEM
volume
structures
2002 EBI (+ RCSB,
PDBj)
Core +
grants
<10 4276 340 GB
(80 MB)
EMPIAR Raw
image
data for
EMDB
structures
2014 EBI grant <5 61 40 TB
(660 GB)
Stats until 9th Nov 2016
What goes where...
• Final single-particle and sub-tomogram average maps must go to
EMDB (tomograms strongly recommended)
• Fitted models must go to PDB
• Deposition of raw image data to EMPIAR is encouraged
EMDBFinal map
EMPIARRaw image
data
PDBFitted model
Benefits of public archiving
• Reuse of data
• starting models
• compare structures of different functional states
• different emphasis may lead to new discoveries
• Validation, methods development, testing, training
• Safe storage of data
• Integration of data with other public archives
• A resource for data mining
• Enables a birds-eye perspective of the field
What does archiving involve?
• Working with the community, partners and journals to
achieve a consensus on practices, policies and
procedures
• Adapting to changing needs of data and meta-data
collection
• new sample preparation methods
• new validation methods
• Providing means to deposition data, e.g., web-based
deposition systems
• Curating data – automated + manual, remediation
• maximize structured annotation, minimize free-text
• Developing added value resources for searching,
validating and visualizing data
Viability
• Community support
• Value – uploads versus downloads
• Data transfer technologies – Aspera, Globus
• Data storage – file systems, object stores
• Data fidelity – quality measures and validation
• Annotation – structured versus unstructured
• Centralised versus distributed
EMPIAR
• Electron microscopy pilot (or public?) image archive
• Started in 2014
• Raw 2D image datasets related to EMDB
• Usage: validation, development, testing, teaching and…
• Safe storage of your data!
• Was source for data in EM Map Validation Challenge
• Multi-frame micrographs, averaged micrographs, particle-
stacks, tilt series
• Uses Aspera, Globus, ftp, http for data transfers
Websites
• emdb-empiar.org – EMDB website
• empiar.org – EMPIAR website
• pdbe.org – PDBe website
• wwpdb.org – Coordinating organization for pdb archive
• emdatabank.org – EMDataBank NIH project website
• https://www.facebook.com/proteindatabank
• https://twitter.com/pdbeurope
EMDB and EMPIAR status
EMDB trends – released entries
Stats until 2 Nov 2016
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0
100
200
300
400
500
600
700
800
900
1000
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Cumulaveentries
Releasedentriesperyear
EMDB released entries trend
Cumula ve entries
Released entries per year
EMPIAR metrics
• Number of entries: 61 (40TB; average size ~ 650GB)
• 7 TB+ sets; one 10TB+ dataset
• Transfer speed: uploads 1-2 TB/24h (Europe, US, Australia)
• “empiar” cited 20+ times in full-text open-access papers
• Nature Methods publication (Iudin et al., 2016)
0
1
2
3
4
2014 2015 2016
Aspera uploads/month (users)
0
0.5
1
1.5
2
2.5
3
2014 2015 2016
Aspera uploads/month (TB)
0
20
40
60
80
2014 2015 2016
Total downloads (users)
0
5
10
15
20
25
30
35
2014 2015 2016
Total downloads (data)
Resources for EMDB and EMPIAR
Searching EMDB - quick links + latest entries
emdb-empiar.org
EMStats – journal stats
Volume slicer
• Available for all EMDB entries
• Published in J Struct Biol (Salavert Torres et al., 2016)
emdb-empiar.org/emd-2363/3dslice
EMPIAR website
empiar.org
EMPIAR entry pages
empiar.org/empiar-10030
EMPIAR API
empiar.org/api/entry/empiar-10004
On-going projects, initiatives and plans
Volume browser
• Integrated visualisation of structural data
• Spanning scales from cells to molecules
Expert workshop on “3D segmentations
and transformations - building bridges
between cellular and molecular structural
biology”
Madingley Hall, 6-7 Dec 2015
Co-funded by
File format and translators
• EMDB Segmentation File Format (EMDB-SFF)
• adds structured biological annotation
• handles transforms between tomograms and subtomograms
• Python scripts to read Segger, IMOD and Amira and convert to
EMDB-SFF
• Working on displaying segmentations in Omero
• Public open source distribution through CCP-EM
Future directions
• Archiving for related imaging modalities including
• 3D scanning electron microscopy
• correlative light and electron microscopy
• soft X-ray tomography
• Data harvesting pipelines
• Validation
• Deposition support for new kinds of validation data
• Validation servers, e.g., for visual analysis, map versus model FSC
• Data-mining EMDB to develop new validation metrics
• Fast archive-wide sub-structure volumetric (or shape-based) searches
Acknowledgements
• Gerard Kleywegt
• EM group
• Sanja Abbott
• Andrii Iudin
• Paul Korir
• Carlos Lugo
• Eduardo Sanz Garcia
• Jose Salavert Torres (UPV)
• Ingvar Lagerstedt (EL)
• Maya Holmdahl (UU)
• Vladislav Lysenkov (MAMK)
• Birkbeck
• Maya Topf
• Agnel Praveen Joseph
• Helen Saibil
• Baylor – Wah Chiu
• RCSB – Cathy Lawson
• Francis Crick
• Lucy Collinson
• Raffaella Carzaniga
• STFC
• Martyn Winn
• Tom Burnley
• Dundee
• Jason Swedlow
• Josh Moore
• CNB Madrid
• Jose Maria Carazo
• Pablo Conesa
• Jose Miguel de la Rosa Trevin
• Joan Segura Mora
• And many more!

More Related Content

Similar to 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

Research Cyberinfrastructure at UCSD - David Minor - RDAP12
Research Cyberinfrastructure at UCSD - David Minor - RDAP12Research Cyberinfrastructure at UCSD - David Minor - RDAP12
Research Cyberinfrastructure at UCSD - David Minor - RDAP12ASIS&T
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsKen Karapetyan
 
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...Beniamino Murgante
 
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareResearch Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareHistoric Environment Scotland
 
Green Shoots: Research Data Management Pilot at Imperial College London
Green Shoots:Research Data Management Pilot at Imperial College LondonGreen Shoots:Research Data Management Pilot at Imperial College London
Green Shoots: Research Data Management Pilot at Imperial College LondonTorsten Reimer
 
Digital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansDigital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansJeffrey Beall
 
Publication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic moleculesPublication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic moleculesChristoph Steinbeck
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchTom Connor
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College LondonSarah Anna Stewart
 
Globus in European Life Science
Globus in European Life ScienceGlobus in European Life Science
Globus in European Life ScienceGlobus
 
The workflows for the ingest of digital objects into a repository/digital l...
The workflows for the ingest of  digital objects into a repository/digital l...The workflows for the ingest of  digital objects into a repository/digital l...
The workflows for the ingest of digital objects into a repository/digital l...Hong (Jenny) Jing
 
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...Lisette Giepmans
 
Smarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesSmarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesOCLC
 
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareRobin Rice
 

Similar to 2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook (20)

Research Cyberinfrastructure at UCSD - David Minor - RDAP12
Research Cyberinfrastructure at UCSD - David Minor - RDAP12Research Cyberinfrastructure at UCSD - David Minor - RDAP12
Research Cyberinfrastructure at UCSD - David Minor - RDAP12
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
 
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShareResearch Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
Research Data Services @ Edinburgh: MANTRA & Edinburgh DataShare
 
Green Shoots: Research Data Management Pilot at Imperial College London
Green Shoots:Research Data Management Pilot at Imperial College LondonGreen Shoots:Research Data Management Pilot at Imperial College London
Green Shoots: Research Data Management Pilot at Imperial College London
 
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
 
RDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian ExperienceRDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian Experience
 
Digital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansDigital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic Librarians
 
Publication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic moleculesPublication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic molecules
 
RDM Programme at University of Edinburgh
RDM Programme at University of EdinburghRDM Programme at University of Edinburgh
RDM Programme at University of Edinburgh
 
Introduction to Research Data Management
Introduction to Research Data ManagementIntroduction to Research Data Management
Introduction to Research Data Management
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
Importance of data standards for large scale data integration in chemistry
Importance of data standards for large scale data integration in chemistryImportance of data standards for large scale data integration in chemistry
Importance of data standards for large scale data integration in chemistry
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
Globus in European Life Science
Globus in European Life ScienceGlobus in European Life Science
Globus in European Life Science
 
The workflows for the ingest of digital objects into a repository/digital l...
The workflows for the ingest of  digital objects into a repository/digital l...The workflows for the ingest of  digital objects into a repository/digital l...
The workflows for the ingest of digital objects into a repository/digital l...
 
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
BioSHaRE: Opal and Mica: a software suite for data harmonization and federati...
 
Smarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesSmarter Data for Smarter Libraries
Smarter Data for Smarter Libraries
 
iMicrobe_ASLO_2015
iMicrobe_ASLO_2015iMicrobe_ASLO_2015
iMicrobe_ASLO_2015
 
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
 

Recently uploaded

Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 

Recently uploaded (20)

The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 

2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives, chalenges and outlook

  • 1. Public archiving of bio-imaging data – perspectives, challenges and outlook Ardan Patwardhan
  • 2. Outline • Introduction • EMDB and EMPIAR status • Resources for EMDB and EMPIAR • On-going projects, initiatives and plans
  • 4. Molecular and Cellular Structure • Maintain and manage archives • PDB for atomic coordinate models • EMDB for 3DEM reconstructions • EMPIAR for 3DEM raw data • Develop and maintain web- services – searching, visualisation and validation • Facilitate community-wide initiatives • Key themes – integration with other bioinformatics resources and imaging scales and validation
  • 5. Structural data archives Archive Type of data Founded Organization Funding # people # entries Size PDB Atomic coordinate models structures 1971 wwpdb (EBI, RCSB, PDBj, BMRB) Core + grants 60-80 124286 1 TB (8 MB) EMDB 3DEM volume structures 2002 EBI (+ RCSB, PDBj) Core + grants <10 4276 340 GB (80 MB) EMPIAR Raw image data for EMDB structures 2014 EBI grant <5 61 40 TB (660 GB) Stats until 9th Nov 2016
  • 6. What goes where... • Final single-particle and sub-tomogram average maps must go to EMDB (tomograms strongly recommended) • Fitted models must go to PDB • Deposition of raw image data to EMPIAR is encouraged EMDBFinal map EMPIARRaw image data PDBFitted model
  • 7. Benefits of public archiving • Reuse of data • starting models • compare structures of different functional states • different emphasis may lead to new discoveries • Validation, methods development, testing, training • Safe storage of data • Integration of data with other public archives • A resource for data mining • Enables a birds-eye perspective of the field
  • 8. What does archiving involve? • Working with the community, partners and journals to achieve a consensus on practices, policies and procedures • Adapting to changing needs of data and meta-data collection • new sample preparation methods • new validation methods • Providing means to deposition data, e.g., web-based deposition systems • Curating data – automated + manual, remediation • maximize structured annotation, minimize free-text • Developing added value resources for searching, validating and visualizing data
  • 9. Viability • Community support • Value – uploads versus downloads • Data transfer technologies – Aspera, Globus • Data storage – file systems, object stores • Data fidelity – quality measures and validation • Annotation – structured versus unstructured • Centralised versus distributed
  • 10. EMPIAR • Electron microscopy pilot (or public?) image archive • Started in 2014 • Raw 2D image datasets related to EMDB • Usage: validation, development, testing, teaching and… • Safe storage of your data! • Was source for data in EM Map Validation Challenge • Multi-frame micrographs, averaged micrographs, particle- stacks, tilt series • Uses Aspera, Globus, ftp, http for data transfers
  • 11. Websites • emdb-empiar.org – EMDB website • empiar.org – EMPIAR website • pdbe.org – PDBe website • wwpdb.org – Coordinating organization for pdb archive • emdatabank.org – EMDataBank NIH project website • https://www.facebook.com/proteindatabank • https://twitter.com/pdbeurope
  • 12. EMDB and EMPIAR status
  • 13. EMDB trends – released entries Stats until 2 Nov 2016 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 100 200 300 400 500 600 700 800 900 1000 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Cumulaveentries Releasedentriesperyear EMDB released entries trend Cumula ve entries Released entries per year
  • 14. EMPIAR metrics • Number of entries: 61 (40TB; average size ~ 650GB) • 7 TB+ sets; one 10TB+ dataset • Transfer speed: uploads 1-2 TB/24h (Europe, US, Australia) • “empiar” cited 20+ times in full-text open-access papers • Nature Methods publication (Iudin et al., 2016) 0 1 2 3 4 2014 2015 2016 Aspera uploads/month (users) 0 0.5 1 1.5 2 2.5 3 2014 2015 2016 Aspera uploads/month (TB) 0 20 40 60 80 2014 2015 2016 Total downloads (users) 0 5 10 15 20 25 30 35 2014 2015 2016 Total downloads (data)
  • 15. Resources for EMDB and EMPIAR
  • 16. Searching EMDB - quick links + latest entries emdb-empiar.org
  • 18. Volume slicer • Available for all EMDB entries • Published in J Struct Biol (Salavert Torres et al., 2016) emdb-empiar.org/emd-2363/3dslice
  • 23. Volume browser • Integrated visualisation of structural data • Spanning scales from cells to molecules
  • 24. Expert workshop on “3D segmentations and transformations - building bridges between cellular and molecular structural biology” Madingley Hall, 6-7 Dec 2015 Co-funded by
  • 25. File format and translators • EMDB Segmentation File Format (EMDB-SFF) • adds structured biological annotation • handles transforms between tomograms and subtomograms • Python scripts to read Segger, IMOD and Amira and convert to EMDB-SFF • Working on displaying segmentations in Omero • Public open source distribution through CCP-EM
  • 26. Future directions • Archiving for related imaging modalities including • 3D scanning electron microscopy • correlative light and electron microscopy • soft X-ray tomography • Data harvesting pipelines • Validation • Deposition support for new kinds of validation data • Validation servers, e.g., for visual analysis, map versus model FSC • Data-mining EMDB to develop new validation metrics • Fast archive-wide sub-structure volumetric (or shape-based) searches
  • 27. Acknowledgements • Gerard Kleywegt • EM group • Sanja Abbott • Andrii Iudin • Paul Korir • Carlos Lugo • Eduardo Sanz Garcia • Jose Salavert Torres (UPV) • Ingvar Lagerstedt (EL) • Maya Holmdahl (UU) • Vladislav Lysenkov (MAMK) • Birkbeck • Maya Topf • Agnel Praveen Joseph • Helen Saibil • Baylor – Wah Chiu • RCSB – Cathy Lawson • Francis Crick • Lucy Collinson • Raffaella Carzaniga • STFC • Martyn Winn • Tom Burnley • Dundee • Jason Swedlow • Josh Moore • CNB Madrid • Jose Maria Carazo • Pablo Conesa • Jose Miguel de la Rosa Trevin • Joan Segura Mora • And many more!

Editor's Notes

  1. (*) Biological annotation added by hand