SlideShare uma empresa Scribd logo
1 de 16
GROWING OPEN DATA: MAKING
THE SHARING OF XXL-SIZED
RESEARCH DATA FILES ONLINE
A REALITY, USING EDINBURGH
DATASHARE
PAULINE WARD: PAULINE.WARD@ED.AC.UK @PAULINEDATAWARD
GEORGE HAMILTON
THE CHALLENGE
• Researchers are generating bigger files. At University of
Edinburgh all researchers are entitled to 500 GB storage.
THE CHALLENGE
• Researchers need to be able to share their data online.
• For impact.
• For discoverability.
• For reproducibility.
• For compliance.
THE CHALLENGE
• DataShare is the Institutional Repository for research data for
staff and students at the University of Edinburgh:
datashare.is.ed.ac.uk .
• Previous file size limit of 2.1 GB.
• Largest file we’ve been asked to share: 20 GB – split into
smaller files.
• Largest fileset we’ve been asked to share: 226 GB – split into
smaller filesets.
THE CHALLENGE
• Some files had to be imported via time-consuming batch
import process because too big / too numerous for web
deposit.
• Some files still waiting to be shared because they are too big
for users to be able to conveniently download them.
• These files are generated from a wide range of disciplines and
wide range of methods.
THE SOLUTION
• Getting the files from the depositors: address upload
• Allowing users to get the files: address download
THE SOLUTION: UPLOAD
• HTML5 resumable upload
THE SOLUTION: UPLOAD
• EDINA’s code for implementing HTML5 upload in DSpace is on
GitHub:
https://github.com/edina/DSpace/tree/xml-html5-upload
• Uses resumable.js
• This was the XMLUI re-write of functionality that was available
for DSpace 5.0 JSPUI. See
https://jira.duraspace.org/browse/DS-1562 for further details.
THE SOLUTION: UPLOAD
• Testing shows files up to 15 GB upload successfully.
• (cf figshare 5 GB file size limit, Zenodo 2 GB)
• 20 GB file upload has been done in testing, but generates an error
message in the browser, and the user must find and Resume the
submission from the Submissions page
• Multiple files can be uploaded by drag’n’drop.
THE SOLUTION: DOWNLOAD
We wanted a mechanism, which DSpace doesn’t provide, of
zipping up files for download.
• BitTorrent was one possible approach: could be added at a later
date
• Other approaches possible (Rsync, Secure Copy (SCP))
THE SOLUTION: DOWNLOAD
• FTP download: agreed
• Tried and tested technology that we are confident we can put in place
and will work well
• All files will be accessed from the FTP server anonymously
• Users can still download files via browser via FTP
• Users who wish can use an FTP client, allowing them to resume a
download
THE SOLUTION: DOWNLOAD
• Specification:
• All files will still be required to have appropriate metadata stored in
DSpace
• All filesets will now be downloadable as a zip file (previous 5.2 GB limit)
• Move DSpace assetstore to a location where more storage available
• Statistics (i.e. numbers) of file downloads by SFTP will be added to
DSpace statistics
THE SOLUTION: DOWNLOAD
• This is a replacement for our current on-the-fly zip file
creation of Item bitstreams.
• Will mitigate potential performance issues. Because it will use
less server resources (Java threads and RAM)
SUMMARY
• We have implemented HTML5 upload in the DataShare (DSpace)
web interface to allow depositors to easily and quickly deposit
individual files up to 15 GB.
• We are working on integrating an SFTP server to allow users to
retrieve filesets larger than our current 20 GB limit. Storage
rather than network/browser timeout will become the limiting
factor on fileset size. We anticipate making numerous filesets
around 100 GB available in this way in the medium term.

Mais conteúdo relacionado

Destaque

Destaque (19)

Introduction to data support services and resources for public policy
Introduction to data support services and resources for public policyIntroduction to data support services and resources for public policy
Introduction to data support services and resources for public policy
 
Using OpenURL Activity Data - Activity Data Online Exchange Event
Using OpenURL Activity Data - Activity Data Online Exchange EventUsing OpenURL Activity Data - Activity Data Online Exchange Event
Using OpenURL Activity Data - Activity Data Online Exchange Event
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
 
Library roles in research data management
Library roles in research data management Library roles in research data management
Library roles in research data management
 
COBWEB - infrastructure and platform for Environmental Crowd Sensing and Big ...
COBWEB - infrastructure and platform for Environmental Crowd Sensing and Big ...COBWEB - infrastructure and platform for Environmental Crowd Sensing and Big ...
COBWEB - infrastructure and platform for Environmental Crowd Sensing and Big ...
 
Introduction to Digimap's Ordnance Survey Collection
Introduction to Digimap's Ordnance Survey CollectionIntroduction to Digimap's Ordnance Survey Collection
Introduction to Digimap's Ordnance Survey Collection
 
LitLong Pecha Kucha
LitLong Pecha KuchaLitLong Pecha Kucha
LitLong Pecha Kucha
 
Research Data Management: Policy Development
Research Data Management: Policy DevelopmentResearch Data Management: Policy Development
Research Data Management: Policy Development
 
RDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian ExperienceRDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian Experience
 
Discover edina programmefinalmeeting-28-sep-2012
Discover edina programmefinalmeeting-28-sep-2012Discover edina programmefinalmeeting-28-sep-2012
Discover edina programmefinalmeeting-28-sep-2012
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!
 
EDINA / Data Library Overview
EDINA / Data Library OverviewEDINA / Data Library Overview
EDINA / Data Library Overview
 
Using OpenURL Activity Data Project 03 Aug 2011
Using OpenURL Activity Data Project 03 Aug 2011Using OpenURL Activity Data Project 03 Aug 2011
Using OpenURL Activity Data Project 03 Aug 2011
 
Ensuring Continuing Access to Online Scholarly Resources
Ensuring Continuing Access to Online Scholarly ResourcesEnsuring Continuing Access to Online Scholarly Resources
Ensuring Continuing Access to Online Scholarly Resources
 
Pushing Open The Jorum: A national repository for learning materials
Pushing Open The Jorum: A national repository for learning materialsPushing Open The Jorum: A national repository for learning materials
Pushing Open The Jorum: A national repository for learning materials
 
Increase usage of online resources Edina presentation
Increase usage of online resources Edina presentationIncrease usage of online resources Edina presentation
Increase usage of online resources Edina presentation
 
ShareGeo: Discovering and Sharing Geospatial Data - 12 months on and going open!
ShareGeo: Discovering and Sharing Geospatial Data - 12 months on and going open!ShareGeo: Discovering and Sharing Geospatial Data - 12 months on and going open!
ShareGeo: Discovering and Sharing Geospatial Data - 12 months on and going open!
 
How does it feel to participate in public?
How does it feel to participate in public?How does it feel to participate in public?
How does it feel to participate in public?
 
Supporting the development of a national Research Data Discovery Service – a ...
Supporting the development of a national Research Data Discovery Service – a ...Supporting the development of a national Research Data Discovery Service – a ...
Supporting the development of a national Research Data Discovery Service – a ...
 

Mais de University of Edinburgh

Mais de University of Edinburgh (9)

Open Science at the University of Edinburgh
Open Science at the University of EdinburghOpen Science at the University of Edinburgh
Open Science at the University of Edinburgh
 
Research Data Service geosciences 18oct2018
Research Data Service geosciences 18oct2018Research Data Service geosciences 18oct2018
Research Data Service geosciences 18oct2018
 
AKVS - Edinburgh Data Repository Experiences June 2016
AKVS - Edinburgh Data Repository Experiences June 2016AKVS - Edinburgh Data Repository Experiences June 2016
AKVS - Edinburgh Data Repository Experiences June 2016
 
Research Data Management training with Open Educational Resources
Research Data Management training with Open Educational ResourcesResearch Data Management training with Open Educational Resources
Research Data Management training with Open Educational Resources
 
Panel members v2_datajournals_repositories_repofringe3aug2015
Panel members v2_datajournals_repositories_repofringe3aug2015Panel members v2_datajournals_repositories_repofringe3aug2015
Panel members v2_datajournals_repositories_repofringe3aug2015
 
Mantra and DataShare 23apr2015
Mantra and DataShare 23apr2015Mantra and DataShare 23apr2015
Mantra and DataShare 23apr2015
 
Why Research Data Management is important: Workshop with graduate students of...
Why Research Data Management is important: Workshop with graduate students of...Why Research Data Management is important: Workshop with graduate students of...
Why Research Data Management is important: Workshop with graduate students of...
 
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...
 
DataShare and MANTRA talk by P Ward to ECA 1oct2014
DataShare and MANTRA talk by P Ward to ECA 1oct2014DataShare and MANTRA talk by P Ward to ECA 1oct2014
DataShare and MANTRA talk by P Ward to ECA 1oct2014
 

Último

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 

Último (20)

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 

Growing Open Data: Making the sharing of XXL-sized research data files online a reality, using Edinburgh DataShare

  • 1. GROWING OPEN DATA: MAKING THE SHARING OF XXL-SIZED RESEARCH DATA FILES ONLINE A REALITY, USING EDINBURGH DATASHARE PAULINE WARD: PAULINE.WARD@ED.AC.UK @PAULINEDATAWARD GEORGE HAMILTON
  • 2. THE CHALLENGE • Researchers are generating bigger files. At University of Edinburgh all researchers are entitled to 500 GB storage.
  • 3. THE CHALLENGE • Researchers need to be able to share their data online. • For impact. • For discoverability. • For reproducibility. • For compliance.
  • 4. THE CHALLENGE • DataShare is the Institutional Repository for research data for staff and students at the University of Edinburgh: datashare.is.ed.ac.uk . • Previous file size limit of 2.1 GB. • Largest file we’ve been asked to share: 20 GB – split into smaller files. • Largest fileset we’ve been asked to share: 226 GB – split into smaller filesets.
  • 5. THE CHALLENGE • Some files had to be imported via time-consuming batch import process because too big / too numerous for web deposit. • Some files still waiting to be shared because they are too big for users to be able to conveniently download them. • These files are generated from a wide range of disciplines and wide range of methods.
  • 6. THE SOLUTION • Getting the files from the depositors: address upload • Allowing users to get the files: address download
  • 7. THE SOLUTION: UPLOAD • HTML5 resumable upload
  • 8. THE SOLUTION: UPLOAD • EDINA’s code for implementing HTML5 upload in DSpace is on GitHub: https://github.com/edina/DSpace/tree/xml-html5-upload • Uses resumable.js • This was the XMLUI re-write of functionality that was available for DSpace 5.0 JSPUI. See https://jira.duraspace.org/browse/DS-1562 for further details.
  • 9.
  • 10.
  • 11. THE SOLUTION: UPLOAD • Testing shows files up to 15 GB upload successfully. • (cf figshare 5 GB file size limit, Zenodo 2 GB) • 20 GB file upload has been done in testing, but generates an error message in the browser, and the user must find and Resume the submission from the Submissions page • Multiple files can be uploaded by drag’n’drop.
  • 12. THE SOLUTION: DOWNLOAD We wanted a mechanism, which DSpace doesn’t provide, of zipping up files for download. • BitTorrent was one possible approach: could be added at a later date • Other approaches possible (Rsync, Secure Copy (SCP))
  • 13. THE SOLUTION: DOWNLOAD • FTP download: agreed • Tried and tested technology that we are confident we can put in place and will work well • All files will be accessed from the FTP server anonymously • Users can still download files via browser via FTP • Users who wish can use an FTP client, allowing them to resume a download
  • 14. THE SOLUTION: DOWNLOAD • Specification: • All files will still be required to have appropriate metadata stored in DSpace • All filesets will now be downloadable as a zip file (previous 5.2 GB limit) • Move DSpace assetstore to a location where more storage available • Statistics (i.e. numbers) of file downloads by SFTP will be added to DSpace statistics
  • 15. THE SOLUTION: DOWNLOAD • This is a replacement for our current on-the-fly zip file creation of Item bitstreams. • Will mitigate potential performance issues. Because it will use less server resources (Java threads and RAM)
  • 16. SUMMARY • We have implemented HTML5 upload in the DataShare (DSpace) web interface to allow depositors to easily and quickly deposit individual files up to 15 GB. • We are working on integrating an SFTP server to allow users to retrieve filesets larger than our current 20 GB limit. Storage rather than network/browser timeout will become the limiting factor on fileset size. We anticipate making numerous filesets around 100 GB available in this way in the medium term.