SlideShare uma empresa Scribd logo
1 de 8
Baixar para ler offline
Big Data Repository for
Structural Biology:
Challenges and Opportunities
Piotr Sliz, PhD
sliz@hkl.hms.harvard.edu
!
SBGrid: http://sbgrid.org
SBGrid Data Bank: http://data.sbgrid.org
Twitter: @SBGrid
YouTube: SBGridTV
SBGrid
Consortium
Support Center at Harvard Medical School
300 Research Groups
13 Countries
Long Term Sustainability: Membership Fee
Harvard Medical!
School
SBGrid supports compilation, installation
and upgrades of ~300 scientific applications
Several Software Categories (EM, NMR, Xrays, Comp Chem, etc.)
Multiple versions of most applications
OS X (10.6-10.10) and Linux support (CentOS 5-7)
No additional, end-user configuration required
Software always works = more time for research
Core Mission:
Grid Computing (Open Science Grid VO + Grid Portal)
General Research Infrastructure (Boston Area)
Training (workshops, software cataloguing, webtales)
Webinars at youtube.com/SBGridTV
Developer Resources
Advocating for Open Source Software
Morin et al. Shining Light into Black Boxes. Science, 2012.
Other Activities:
Additional!
Publications
Primary Citation:
Other Citations:
New Opportunity:
Data
anonymous SBGrid member 1:
“we cannot find the original frames for many of our
structures (move from X to Y), including recent high
impact projects. What do you recommend that we do?”
anonymous SBGrid member 2:
“I was able to locate the data directory
but I must have done a good job
cleaning up the disk space before I
left: usually there are only two .img files
left in the data directory, the 1st and
the last image of a full run.”
Lack of Storage Support
for Diffraction Images
derive
reproduce
improve
correct
• Stokes-Rees, I., Levesque, I., Murphy, F.V., Yang, W., Deacon, A., and Sliz, P. (2012). Adapting federated
cyberinfrastructure for shared data collection facilities in structural biology. J Synchrotron Radiat 19, 462–467.
• Terwilliger, T.C., and Bricogne, G. (2014). Continuous mutual improvement of macromolecular structure models in the PDB
and of X-ray crystallographic software: the dual role of deposited experimental data. Acta Crystallogr. D Biol. Crystallogr.
70, 2533–2543.
• Terwilliger, T.C. (2014). Archiving raw crystallographic data. Acta Crystallogr D Biol Crystallogr.
• Guss, J.M., and McMahon (2014). How to make deposition of images a reality. Acta Crystallogr. D Biol. Crystallogr. 70,
2520–2532
Focus on Primary	

Data
SBGrid Data Bank. Pilot: May 1st, Production: June 1st, 2015	

EZID
Dataset
Lock
BIODBCORE-­‐000683
re3data.org
Data Mining
and
Annotation
Web 	

Interface
Related!
Datasets
Depositors:
URL: data.sbgrid.org
Dataset Landing Page
DataCite!
Schema CC0 License
Download
Dataset URL
Current Statistics
Publication Workflow:
Data Access Alliance:
Make Data easily accessible for reprocessing
Minimize Project Cost
Increase Redundancy
Challenges
Dataset Size (APIs, Data Access Alliance)
Journal + Data Automation
automated embargo release
cross-referencing
coordination/communication with journals
Data vs Journal Citations
Metrics:
Dataset Deposition Rates
Data Use: DAA Membership vs. direct downloads
Dataset Quality (Level 0-2)
Data Citations
Master Format
OME-TIFF vs DataCite vs DataVerse schema
Transition to a Research Data Management Software
ORCID integration and adoption
Opportunities
Better support to ~300 structural biology laboratories:
Compliance
Reproducibility
Integration with PDB and other repositories
Other data types in addition to X-ray diffraction
Thank you
Piotr Sliz, PhD
sliz@hkl.hms.harvard.edu
!
SBGrid: http://sbgrid.org
SBGrid Data Bank: http://data.sbgrid.org
!
Twitter: @SBGrid
YouTube: SBGridTV
Stephanie Socias
Pete Meyer
Merce Crosas

Mais conteúdo relacionado

Mais procurados

Current trends in data security nursing research ppt
Current trends in data security nursing research pptCurrent trends in data security nursing research ppt
Current trends in data security nursing research ppt
Nursing Path
 

Mais procurados (20)

A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Scott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionScott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science session
 
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALSBROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
BROWN BAG TALK WITH MICAH ALTMAN INTEGRATING OPEN DATA INTO OPEN ACCESS JOURNALS
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with Confidence
 
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
Maintaining Data Confidentiality in Association Rule Mining in Distributed En...
 
DataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE Education Module 07: Metadata
DataONE Education Module 07: Metadata
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?
 
Current trends in data security nursing research ppt
Current trends in data security nursing research pptCurrent trends in data security nursing research ppt
Current trends in data security nursing research ppt
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
Privacy Preserving DB Systems
Privacy Preserving DB SystemsPrivacy Preserving DB Systems
Privacy Preserving DB Systems
 

Semelhante a Big Data Repository for Structural Biology: Challenges and Opportunities by Piotr Sliz

Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
Ian Foster
 
Data management plans archeology class 10 18 2012
Data management plans archeology class 10 18 2012Data management plans archeology class 10 18 2012
Data management plans archeology class 10 18 2012
Elizabeth Brown
 

Semelhante a Big Data Repository for Structural Biology: Challenges and Opportunities by Piotr Sliz (20)

The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
BeSTGRID OpenGridForum 29 GIN session
BeSTGRID OpenGridForum 29 GIN sessionBeSTGRID OpenGridForum 29 GIN session
BeSTGRID OpenGridForum 29 GIN session
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
eResearch New Zealand Keynote
eResearch New Zealand KeynoteeResearch New Zealand Keynote
eResearch New Zealand Keynote
 
CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730
 
Advancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated CyberinfrastructureAdvancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated Cyberinfrastructure
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation Heidorn
 
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
 
Data management plans archeology class 10 18 2012
Data management plans archeology class 10 18 2012Data management plans archeology class 10 18 2012
Data management plans archeology class 10 18 2012
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Building an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesBuilding an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic Sciences
 
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use caseEnabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case
 

Mais de datascienceiqss

American Journal of Political Science & The Odum Institute: Promoting Researc...
American Journal of Political Science & The Odum Institute: Promoting Researc...American Journal of Political Science & The Odum Institute: Promoting Researc...
American Journal of Political Science & The Odum Institute: Promoting Researc...
datascienceiqss
 

Mais de datascienceiqss (20)

Citing Data in Journal Articles using JATS by Deborah A. Lapeyre
Citing Data in Journal Articles using JATS by Deborah A. LapeyreCiting Data in Journal Articles using JATS by Deborah A. Lapeyre
Citing Data in Journal Articles using JATS by Deborah A. Lapeyre
 
iRODS/Dataverse Project by Jonathan Crabtree
iRODS/Dataverse Project by Jonathan CrabtreeiRODS/Dataverse Project by Jonathan Crabtree
iRODS/Dataverse Project by Jonathan Crabtree
 
DataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
DataTags: Sharing Privacy Sensitive Data by Latanya SweeneyDataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
DataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
 
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
 
Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...
Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...
Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...
 
Geospatial Data Visualization: WorldMap Integration by Raman Prasad
Geospatial Data Visualization: WorldMap Integration by Raman PrasadGeospatial Data Visualization: WorldMap Integration by Raman Prasad
Geospatial Data Visualization: WorldMap Integration by Raman Prasad
 
Sharing Data Through Plots with Plotly by Alex Johnson
Sharing Data Through Plots with Plotly by Alex JohnsonSharing Data Through Plots with Plotly by Alex Johnson
Sharing Data Through Plots with Plotly by Alex Johnson
 
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
 
MIT Libraries Dataverse by Katherine McNeill
MIT Libraries Dataverse by Katherine McNeillMIT Libraries Dataverse by Katherine McNeill
MIT Libraries Dataverse by Katherine McNeill
 
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
 
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Dataverse in China: Internationalization, Curation and Promotion by Yin ShenqinDataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
 
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
 
Metadata & Data Curation Services by Thu-Mai Christian
Metadata & Data Curation Services by Thu-Mai ChristianMetadata & Data Curation Services by Thu-Mai Christian
Metadata & Data Curation Services by Thu-Mai Christian
 
American Journal of Political Science & The Odum Institute: Promoting Researc...
American Journal of Political Science & The Odum Institute: Promoting Researc...American Journal of Political Science & The Odum Institute: Promoting Researc...
American Journal of Political Science & The Odum Institute: Promoting Researc...
 
Political Analysis Dataverse by Jonathan N. Katz
Political Analysis Dataverse by Jonathan N. KatzPolitical Analysis Dataverse by Jonathan N. Katz
Political Analysis Dataverse by Jonathan N. Katz
 
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgman
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
 
Data Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models by Sünje Dallmeier-TiessenData Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models by Sünje Dallmeier-Tiessen
 
Persistent Identifier Services and their Metadata by John Kunze
Persistent Identifier Services and their Metadata by John KunzePersistent Identifier Services and their Metadata by John Kunze
Persistent Identifier Services and their Metadata by John Kunze
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 

Big Data Repository for Structural Biology: Challenges and Opportunities by Piotr Sliz

  • 1. Big Data Repository for Structural Biology: Challenges and Opportunities Piotr Sliz, PhD sliz@hkl.hms.harvard.edu ! SBGrid: http://sbgrid.org SBGrid Data Bank: http://data.sbgrid.org Twitter: @SBGrid YouTube: SBGridTV SBGrid Consortium Support Center at Harvard Medical School 300 Research Groups 13 Countries Long Term Sustainability: Membership Fee Harvard Medical! School
  • 2. SBGrid supports compilation, installation and upgrades of ~300 scientific applications Several Software Categories (EM, NMR, Xrays, Comp Chem, etc.) Multiple versions of most applications OS X (10.6-10.10) and Linux support (CentOS 5-7) No additional, end-user configuration required Software always works = more time for research Core Mission: Grid Computing (Open Science Grid VO + Grid Portal) General Research Infrastructure (Boston Area) Training (workshops, software cataloguing, webtales) Webinars at youtube.com/SBGridTV Developer Resources Advocating for Open Source Software Morin et al. Shining Light into Black Boxes. Science, 2012. Other Activities: Additional! Publications Primary Citation: Other Citations:
  • 3. New Opportunity: Data anonymous SBGrid member 1: “we cannot find the original frames for many of our structures (move from X to Y), including recent high impact projects. What do you recommend that we do?” anonymous SBGrid member 2: “I was able to locate the data directory but I must have done a good job cleaning up the disk space before I left: usually there are only two .img files left in the data directory, the 1st and the last image of a full run.” Lack of Storage Support for Diffraction Images derive reproduce improve correct • Stokes-Rees, I., Levesque, I., Murphy, F.V., Yang, W., Deacon, A., and Sliz, P. (2012). Adapting federated cyberinfrastructure for shared data collection facilities in structural biology. J Synchrotron Radiat 19, 462–467. • Terwilliger, T.C., and Bricogne, G. (2014). Continuous mutual improvement of macromolecular structure models in the PDB and of X-ray crystallographic software: the dual role of deposited experimental data. Acta Crystallogr. D Biol. Crystallogr. 70, 2533–2543. • Terwilliger, T.C. (2014). Archiving raw crystallographic data. Acta Crystallogr D Biol Crystallogr. • Guss, J.M., and McMahon (2014). How to make deposition of images a reality. Acta Crystallogr. D Biol. Crystallogr. 70, 2520–2532
  • 4. Focus on Primary Data SBGrid Data Bank. Pilot: May 1st, Production: June 1st, 2015 EZID Dataset Lock BIODBCORE-­‐000683 re3data.org Data Mining and Annotation
  • 5. Web Interface Related! Datasets Depositors: URL: data.sbgrid.org Dataset Landing Page DataCite! Schema CC0 License Download Dataset URL
  • 7. Data Access Alliance: Make Data easily accessible for reprocessing Minimize Project Cost Increase Redundancy Challenges Dataset Size (APIs, Data Access Alliance) Journal + Data Automation automated embargo release cross-referencing coordination/communication with journals Data vs Journal Citations Metrics: Dataset Deposition Rates Data Use: DAA Membership vs. direct downloads Dataset Quality (Level 0-2) Data Citations Master Format OME-TIFF vs DataCite vs DataVerse schema Transition to a Research Data Management Software ORCID integration and adoption
  • 8. Opportunities Better support to ~300 structural biology laboratories: Compliance Reproducibility Integration with PDB and other repositories Other data types in addition to X-ray diffraction Thank you Piotr Sliz, PhD sliz@hkl.hms.harvard.edu ! SBGrid: http://sbgrid.org SBGrid Data Bank: http://data.sbgrid.org ! Twitter: @SBGrid YouTube: SBGridTV Stephanie Socias Pete Meyer Merce Crosas