SlideShare uma empresa Scribd logo
1 de 57
Baixar para ler offline
Electronic lab notebooks and data repositories
Complementary responses to the scientific data problem
ACS Dallas session on
Research data and electronic lab notebooks
17 March 2014
Rory Macneil, with thanks to Sunny Yang, Robin Rice,
George Hamilton and Gary Ferguson
What we’re going to cover
• The scientific data problem
• ELNs and data repositories: Complementary
approaches to solving the problem
• 1 + 1 = 3? An example of ELN-repository integration:
• RSpace and Edinburgh DataShare
• Looking ahead
Research workflow 1
And the data gets lost
Publish
results
Document
research
Enter data
The scientific data problem:
How to capture data and make it
available for scrutiny and reuse?
• Ease of data entry drives uptake, but adding
structure to data enhances value
• Different types of data, different fields of
research, different styles of documentation
Whose problem?
Funders
Communities
Researchers
PIs/Labs
Institutions
The scientific problem rephrased:
How to add structure to data?
– Who decides the parameters?
• Researcher
• PI
• System (ELN/Repository)
• Community (Dial-a-Molecule, Henry Rzepa et al)
– Who adds it and how?
• Researcher
• Curator
• Machine
– When is it added?
• Pre-documentation (in system)
• During documentation (in ELN)
• Post-documentation (in Repository)
– How much scope for variation?
How do ELNs and data repositories
deal with this problem?
What is an electronic lab notebook?
Software that enables researchers to manage
and present research data
ELNs: The first generation
1990s origins for Big Pharma
• Windows based
• For pcs
• Domain and project specific
• Complex
• Required lots of training
• Very expensive
ELNs: The second generation
2000s for academic researchers
– Generic tools adapted by individuals
• Evernote, Dropbox
• Web-based, platform agnostic
• Generic, flexible
• Very easy to use, very cheap
– Lab-oriented ELNs
• eCAT, Lab Archives, Ruro
• Web-based, platform agnostic
• Generic, flexible
• Easy to use, cheap
• Sharing and groups
ELNs: We need a third generation
2011/12 Wisconsin pilot
“We need an ELN that can be rolled out across the
university”
It has to:
• Be easy to use
• Be platform agnostic
• Support intra- and inter-group collaboration
• Have enterprise capabilities
• Support data publishing and archiving
and . . .
Still be affordable!!!!!!!!!”
Institution as customer:
Brave new world
• ELN must provide flexibility and breadth across
disciplines
• University provides
– Funding
– Support
• IT
• Training
• Driving
– Mass uptake
– Consolidation of providers
Who and what is driving demand for
ELNs?
• Researchers
– Utility and convenience of paper lab book + online capabilities
– On multiple devices
– File management/integration
• Groups/PIs
– Controlled sharing
– Collaboration
– Group management
– File management/integration
• Institutions: Data librarians, IT, commercialisation offices
– Enterprise features: Scalable deployment, Single Sign On
– IP protection: audit trail, signing
– Publishing
– Archiving
– Repository integration
– File management/integration
Data repositories
An information repository is an easy way to deploy a
secondary tier of data storage that can comprise
multiple, networked data storage technologies running
on diverse operating systems, where data that no
longer needs to be in primary storage is protected,
classified according to captured metadata, processed,
de-duplicated, and then purged, automatically, based
on data service level objectives and requirements.
Three types of data repositories
• Domain specific
– Chemistry: PubChem, ChemSpider
– Life Sciences: Dryad, NCBI Databases
• Institutional
• Generic
– Figshare
What’s driving demand for data
repositories?
• Domain Specific
– Who?
• Research communities
– Why?
• Capture data for review and reuse
• Improve quality and reliability of data
• Standardize research techniques to improve productivity
• Institutional
– Who:
• Funders → Data librarians
– Why?
• Maintain data
• Make data available for reuse
ELNs and data repositories:
what do they do?
Data entry
Data organization
Documentation
Metadata creation
Data and metadata export
Data import
Data preservation
Data sharing
Data reuse
ELN Repository
ELNs + Data repository
enables new workflow
ELN
Enter data and
document research
Data repository
Store data and
metadata
Publication
Publish results
Data is captured, structured and made available for reuse
Research workflow 1
And the data gets lost
Publish
results
Document
research
Enter data
ELNs + Data repository
enables new workflow
ELN
Enter data and
document research
Data repository
Store data and
metadata
Publication
Publish results
Data is captured, structured and made available for reuse
An example of integration:
RSpace and Edinburgh DataShare
RSpace
• Conceived in response to Wisconsin RFP and
trial
• Developed with Wisconsin by Research Space
2012 - 2013
Researcher experience
Sketching √
Image annotation √
Chemical structures √
Notebook √
Forms √
Templating √
PDF export √
Export to Word √
File gallery √
Journal view √
Tablet friendly √
Clean design √
Performance √
Round trip editing √
Offline access √
Sample management √
PI/Lab support
Sharing √
Messaging √
Lab set up enabled √
Group management √
Inter-group collaboration √
Institutional requirements
(IT, data librarians, commercialisation)
Single sign on √
Sys Admin/RSpace admin √
Group set up √
IP support √
Export to XML √
Archiving √
Repository integration √
RSpace design advantages
• Easy data entry
• Easy and flexible data structuring
• Multiple ways of getting data out (and back in)
– Export PDF
– Export Zip (XML)
– Re-import, preserving structure
– Archive (with metadata)
• Inter – institutional support
• Re-import, preserving structure
What is Edinburgh DataShare?
Edinburgh DataShare is a free-at-point-of-
use data repository service which allows
University researchers to upload, share, and
license their data resources for online
discovery and re-use by others.
Whence Edinburgh DataShare?
• The service was built as an output of the
DISC-UK DataShare project, which
explored pathways for academics to share
their research data over the Internet at the
Universities of Edinburgh, Oxford and
Southampton (2007-2009, Jisc
Repositories and Preservation
Programme).
Why Edinburgh Datashare?
“7. Research data management plans must
ensure that research data are available for
access and re-use where appropriate and
under appropriate safeguards.”
Edinburgh Datashare and
University RDM policy
“9. Research data of future historical
interest, and all research data that represent
records of the University, including data that
substantiate research findings, will be
offered and assessed for deposit and
retention in an appropriate national or
international data service or domain
repository, or a University repository.”
Edinburgh Datashare and
University RDM policy
“10. Exclusive rights to reuse or publish
research data should not be handed over to
commercial publishers or agents without
retaining the rights to make the data openly
available for re-use, unless this is a
condition of funding.”
Does DSpace work for data?
• Communities, collections, data items, files
• Metadata subset from DCMI “dcterms”
vocabulary, RDF-compliant – aids
discovery
• Persistent identifier (handle) assigned
• Suggested citation provided; viewable
download statistics
• Single-sign on with EASE
• Embargo option – delayed publication
Quality controls
– New users can register with EASE login but
are given permission to add to a collection
– Data Library staff or collection administrator
approves each item before publication
– Dataset / data item must include
documentation as well as data files
– Metadata field to point to related publication
(“is referenced by”)
RSpace – Edinburgh Datashare
integration: Overview
• Background
• People
– Edinburgh: Robin Rice, George Hamilton
– Research Space: Sunny Yang, Gary Ferguson
• Process
– Understanding DataShare (DSpace + SWORD + Datashare)
– Backend integration
– METS (metadata encoding transaction standard)
development
– Interface decisions
RSpace – Edinburgh Datashare
integration: Backend platform
– Edinburgh DataShare has three interfaces/APIs
• Web-UI
• Python
• SWORD (simple Java based web-service which supports
repository deposits)
– RSpace uses the SWORD Interface
– The SWORD server accepts a file for deposition if a
METS description file is provided
Four part METS implementation in
RSpace – Datashare integration
• RSpace uses the standard METS header
• DMD -- field definitions are based on Dublin Core
– Four required fields in Edinburgh DataShare -- contributor,
publisher, title, and data creator -- must be completed as part
of the deposit through RSpace
– Additional optional fields can be filled in later by DataShare
administrator:
• FUNDER, SPATIAL_COVERAGE, TIME_PERIOD, DATA_CREATOR,
AVAILABLE_DATE, DESCRIPTION_ABSTRACT, DESCRIPTION_TOC, LANGUAGE,
RELATION_VERSION_OF, RELATION_REFERENCED_BY, SUPERCEDES, RIGHT,
SOURCE, SUBJECT_KEYWORDS, SUBJECT_CLASSIFICATION, ALTERNATIVE_TITLE
• All zipped files and their mime-types (e.g. application/pdf,
text/html) are included
• A structure map describes the full structure and
relationships between the above three elements
RSpace – Edinburgh Datashare
integration: Workflow
• Front end trigger
– An RSpace user selects files/folders/notebooks to be deposited from
RSpace, and starts the deposit process
• Backend to support the user workflow
– RSpace extracts the associated data and resources from its database
and file-store
– These are turned into xml files
– METS is used to describe the zip file and each selected file
– The xml, resource, and METS files are zipped into a zip file for
archiving
– The DSpace SWORD client deposits the zip file to DataShare after an
authentication and validation procedure
– File deposited in Collection associated with Depositor
User workflow 1
My records in RSpace
User workflow 2a
Record(s) for export
User workflow 2b
Record(s) for export
User workflow 2c
Selecting record(s) for export
User workflow 2d
Selecting record(s) for export
User workflow 3
Deposit dialog
User workflow 4
Completed deposit dialog
User workflow 5
Depositing dialog
User workflow 6
DataShare agreement message
User workflow 7
DataShare agreement email
User workflow 8
Submission approval email
User workflow 9
Submission screen in DataShare
Looking ahead 1:
Integrating ELNs with other digital
resources
– Other institutional repositories
– Domain specific repositories
• PubChem, ChemSpider
– Archives
• Institutional
• Commercial
– Chemical Semantics
– CDD Vault
Future workflow 1
ELN
Enter data and
document research
Data
repositories
Store data and
metadata
Publications
Publish data and
results
Looking ahead 2:
ELN as interactive publishing platform
Institutional
repositories
Datasets
(Publications)
Archives
Domain
repositories
ELN
Data
Metadata
ELN transformed from point tool to
infrastructural glue at the heart of the
research process
ELN also becomes the backbone of
data management at the University
Integration with
– Data storage
– Data repository
– Data archive
Scientific data: whose problem?
Funders
Communities
Researchers
PIs/Labs
Institutions
Which model(s) will win?
Those that best satisfy the core requirements of
the key constituencies
• Easy data entry and usability
– Researchers, Labs/PIs
• Easy integration with write up/publication
– Researchers, Labs/PIs
• Enhanced collaboration and better management of data and research
– PIs/labs, Institutions
• Comprehensive and easy data capture and availability for reuse
– Institutions/Communities
• Flexibility to support different workflows and outputs
– Researchers, Labs/PIs, Communities
• And by the way, as Wisconsin said:
It has to be
affordable!
Thank you
Rory Macneil
rmacneil@researchspace.com

Mais conteúdo relacionado

Mais procurados

Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data LifecycleSteven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data LifecycleSteve Androulakis
 
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017ARDC
 
On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...
On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...
On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...EDINA, University of Edinburgh
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedStefan Dietze
 
A national repository (library?) service for learning materials
A national repository (library?) service for learning materialsA national repository (library?) service for learning materials
A national repository (library?) service for learning materialsEDINA, University of Edinburgh
 
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...datascienceiqss
 
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...ResearchSpace
 
Zaven Akopov (DESY -L-) For the INSPIRE Collaboration DESY ...
Zaven Akopov (DESY -L-) For the INSPIRE Collaboration DESY ...Zaven Akopov (DESY -L-) For the INSPIRE Collaboration DESY ...
Zaven Akopov (DESY -L-) For the INSPIRE Collaboration DESY ...Zaven Hakopov
 

Mais procurados (20)

Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data LifecycleSteven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
 
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
 
On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...
On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...
On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...
 
Using a dumb identifier to do smart things
Using a dumb identifier to do smart thingsUsing a dumb identifier to do smart things
Using a dumb identifier to do smart things
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
 
Roles & Skills for RDM
Roles & Skills for RDMRoles & Skills for RDM
Roles & Skills for RDM
 
PEPRS: Recording The Extent Preserved
PEPRS: Recording The Extent PreservedPEPRS: Recording The Extent Preserved
PEPRS: Recording The Extent Preserved
 
A national repository (library?) service for learning materials
A national repository (library?) service for learning materialsA national repository (library?) service for learning materials
A national repository (library?) service for learning materials
 
Edina cigs-21-september-2012
Edina cigs-21-september-2012Edina cigs-21-september-2012
Edina cigs-21-september-2012
 
Tales from the Keepers Registry
Tales from the Keepers RegistryTales from the Keepers Registry
Tales from the Keepers Registry
 
RDM Programme @ Edinburgh - Service Interoperation
RDM Programme @ Edinburgh - Service InteroperationRDM Programme @ Edinburgh - Service Interoperation
RDM Programme @ Edinburgh - Service Interoperation
 
Edinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for DataEdinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for Data
 
Introduction to RDM for trainee physicians
Introduction to RDM for trainee physiciansIntroduction to RDM for trainee physicians
Introduction to RDM for trainee physicians
 
RDM Programme @ Edinburgh
RDM Programme @ Edinburgh RDM Programme @ Edinburgh
RDM Programme @ Edinburgh
 
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
 
Aggregation as Tactic
Aggregation as TacticAggregation as Tactic
Aggregation as Tactic
 
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
Service integration to Enhance RDM: RSpace electronic lab notebook at the Uni...
 
Engaging the Researcher in RDM
Engaging the Researcher in RDMEngaging the Researcher in RDM
Engaging the Researcher in RDM
 
Organising and Documenting Data
Organising and Documenting DataOrganising and Documenting Data
Organising and Documenting Data
 
Zaven Akopov (DESY -L-) For the INSPIRE Collaboration DESY ...
Zaven Akopov (DESY -L-) For the INSPIRE Collaboration DESY ...Zaven Akopov (DESY -L-) For the INSPIRE Collaboration DESY ...
Zaven Akopov (DESY -L-) For the INSPIRE Collaboration DESY ...
 

Semelhante a Integrating an electronic lab notebook with a data repository; American Chemical Society March 2014

Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareRobin Rice
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghRobin Rice
 
Integrating an electronic lab notebook with a university it environment rdmf ...
Integrating an electronic lab notebook with a university it environment rdmf ...Integrating an electronic lab notebook with a university it environment rdmf ...
Integrating an electronic lab notebook with a university it environment rdmf ...rmacneil88
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationMANENDRASINGH30
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
How Elns can galvanise research data management.
How Elns can galvanise research data management. How Elns can galvanise research data management.
How Elns can galvanise research data management. rmacneil88
 
Integrating repositories and eLab notebooks through an open science framework
Integrating repositories and eLab notebooks through an open science frameworkIntegrating repositories and eLab notebooks through an open science framework
Integrating repositories and eLab notebooks through an open science frameworkrmacneil88
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
 
ESI Supplemental 1 E-research Support Slides
ESI Supplemental 1   E-research Support SlidesESI Supplemental 1   E-research Support Slides
ESI Supplemental 1 E-research Support SlidesDuraSpace
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016IzzyChad
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Managementdancrane_open
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College LondonSarah Anna Stewart
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportPascal-Nicolas Becker
 
Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...rmacneil88
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 

Semelhante a Integrating an electronic lab notebook with a data repository; American Chemical Society March 2014 (20)

Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
 
RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of Edinburgh
 
Integrating an electronic lab notebook with a university it environment rdmf ...
Integrating an electronic lab notebook with a university it environment rdmf ...Integrating an electronic lab notebook with a university it environment rdmf ...
Integrating an electronic lab notebook with a university it environment rdmf ...
 
DSpace for Data Revisited
DSpace for Data RevisitedDSpace for Data Revisited
DSpace for Data Revisited
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
How Elns can galvanise research data management.
How Elns can galvanise research data management. How Elns can galvanise research data management.
How Elns can galvanise research data management.
 
Integrating repositories and eLab notebooks through an open science framework
Integrating repositories and eLab notebooks through an open science frameworkIntegrating repositories and eLab notebooks through an open science framework
Integrating repositories and eLab notebooks through an open science framework
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
 
Ppls mvm2
Ppls mvm2Ppls mvm2
Ppls mvm2
 
ESI Supplemental 1 E-research Support Slides
ESI Supplemental 1   E-research Support SlidesESI Supplemental 1   E-research Support Slides
ESI Supplemental 1 E-research Support Slides
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data Support
 
Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Introduction to Research Data Management
Introduction to Research Data ManagementIntroduction to Research Data Management
Introduction to Research Data Management
 

Último

Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FESTBillieHyde
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveIES VE
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptxHansamali Gamage
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfInfopole1
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 

Último (20)

Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FEST
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 

Integrating an electronic lab notebook with a data repository; American Chemical Society March 2014

  • 1. Electronic lab notebooks and data repositories Complementary responses to the scientific data problem ACS Dallas session on Research data and electronic lab notebooks 17 March 2014 Rory Macneil, with thanks to Sunny Yang, Robin Rice, George Hamilton and Gary Ferguson
  • 2. What we’re going to cover • The scientific data problem • ELNs and data repositories: Complementary approaches to solving the problem • 1 + 1 = 3? An example of ELN-repository integration: • RSpace and Edinburgh DataShare • Looking ahead
  • 3. Research workflow 1 And the data gets lost Publish results Document research Enter data
  • 4. The scientific data problem: How to capture data and make it available for scrutiny and reuse? • Ease of data entry drives uptake, but adding structure to data enhances value • Different types of data, different fields of research, different styles of documentation
  • 6. The scientific problem rephrased: How to add structure to data? – Who decides the parameters? • Researcher • PI • System (ELN/Repository) • Community (Dial-a-Molecule, Henry Rzepa et al) – Who adds it and how? • Researcher • Curator • Machine – When is it added? • Pre-documentation (in system) • During documentation (in ELN) • Post-documentation (in Repository) – How much scope for variation?
  • 7. How do ELNs and data repositories deal with this problem?
  • 8. What is an electronic lab notebook? Software that enables researchers to manage and present research data
  • 9. ELNs: The first generation 1990s origins for Big Pharma • Windows based • For pcs • Domain and project specific • Complex • Required lots of training • Very expensive
  • 10. ELNs: The second generation 2000s for academic researchers – Generic tools adapted by individuals • Evernote, Dropbox • Web-based, platform agnostic • Generic, flexible • Very easy to use, very cheap – Lab-oriented ELNs • eCAT, Lab Archives, Ruro • Web-based, platform agnostic • Generic, flexible • Easy to use, cheap • Sharing and groups
  • 11. ELNs: We need a third generation 2011/12 Wisconsin pilot “We need an ELN that can be rolled out across the university” It has to: • Be easy to use • Be platform agnostic • Support intra- and inter-group collaboration • Have enterprise capabilities • Support data publishing and archiving and . . . Still be affordable!!!!!!!!!”
  • 12. Institution as customer: Brave new world • ELN must provide flexibility and breadth across disciplines • University provides – Funding – Support • IT • Training • Driving – Mass uptake – Consolidation of providers
  • 13. Who and what is driving demand for ELNs? • Researchers – Utility and convenience of paper lab book + online capabilities – On multiple devices – File management/integration • Groups/PIs – Controlled sharing – Collaboration – Group management – File management/integration • Institutions: Data librarians, IT, commercialisation offices – Enterprise features: Scalable deployment, Single Sign On – IP protection: audit trail, signing – Publishing – Archiving – Repository integration – File management/integration
  • 14. Data repositories An information repository is an easy way to deploy a secondary tier of data storage that can comprise multiple, networked data storage technologies running on diverse operating systems, where data that no longer needs to be in primary storage is protected, classified according to captured metadata, processed, de-duplicated, and then purged, automatically, based on data service level objectives and requirements.
  • 15. Three types of data repositories • Domain specific – Chemistry: PubChem, ChemSpider – Life Sciences: Dryad, NCBI Databases • Institutional • Generic – Figshare
  • 16. What’s driving demand for data repositories? • Domain Specific – Who? • Research communities – Why? • Capture data for review and reuse • Improve quality and reliability of data • Standardize research techniques to improve productivity • Institutional – Who: • Funders → Data librarians – Why? • Maintain data • Make data available for reuse
  • 17. ELNs and data repositories: what do they do? Data entry Data organization Documentation Metadata creation Data and metadata export Data import Data preservation Data sharing Data reuse ELN Repository
  • 18. ELNs + Data repository enables new workflow ELN Enter data and document research Data repository Store data and metadata Publication Publish results Data is captured, structured and made available for reuse
  • 19. Research workflow 1 And the data gets lost Publish results Document research Enter data
  • 20. ELNs + Data repository enables new workflow ELN Enter data and document research Data repository Store data and metadata Publication Publish results Data is captured, structured and made available for reuse
  • 21. An example of integration: RSpace and Edinburgh DataShare
  • 22. RSpace • Conceived in response to Wisconsin RFP and trial • Developed with Wisconsin by Research Space 2012 - 2013
  • 23. Researcher experience Sketching √ Image annotation √ Chemical structures √ Notebook √ Forms √ Templating √ PDF export √ Export to Word √ File gallery √ Journal view √ Tablet friendly √ Clean design √ Performance √ Round trip editing √ Offline access √ Sample management √
  • 24. PI/Lab support Sharing √ Messaging √ Lab set up enabled √ Group management √ Inter-group collaboration √
  • 25. Institutional requirements (IT, data librarians, commercialisation) Single sign on √ Sys Admin/RSpace admin √ Group set up √ IP support √ Export to XML √ Archiving √ Repository integration √
  • 26. RSpace design advantages • Easy data entry • Easy and flexible data structuring • Multiple ways of getting data out (and back in) – Export PDF – Export Zip (XML) – Re-import, preserving structure – Archive (with metadata) • Inter – institutional support • Re-import, preserving structure
  • 27. What is Edinburgh DataShare? Edinburgh DataShare is a free-at-point-of- use data repository service which allows University researchers to upload, share, and license their data resources for online discovery and re-use by others.
  • 28. Whence Edinburgh DataShare? • The service was built as an output of the DISC-UK DataShare project, which explored pathways for academics to share their research data over the Internet at the Universities of Edinburgh, Oxford and Southampton (2007-2009, Jisc Repositories and Preservation Programme).
  • 29. Why Edinburgh Datashare? “7. Research data management plans must ensure that research data are available for access and re-use where appropriate and under appropriate safeguards.”
  • 30. Edinburgh Datashare and University RDM policy “9. Research data of future historical interest, and all research data that represent records of the University, including data that substantiate research findings, will be offered and assessed for deposit and retention in an appropriate national or international data service or domain repository, or a University repository.”
  • 31. Edinburgh Datashare and University RDM policy “10. Exclusive rights to reuse or publish research data should not be handed over to commercial publishers or agents without retaining the rights to make the data openly available for re-use, unless this is a condition of funding.”
  • 32. Does DSpace work for data? • Communities, collections, data items, files • Metadata subset from DCMI “dcterms” vocabulary, RDF-compliant – aids discovery • Persistent identifier (handle) assigned • Suggested citation provided; viewable download statistics • Single-sign on with EASE • Embargo option – delayed publication
  • 33. Quality controls – New users can register with EASE login but are given permission to add to a collection – Data Library staff or collection administrator approves each item before publication – Dataset / data item must include documentation as well as data files – Metadata field to point to related publication (“is referenced by”)
  • 34. RSpace – Edinburgh Datashare integration: Overview • Background • People – Edinburgh: Robin Rice, George Hamilton – Research Space: Sunny Yang, Gary Ferguson • Process – Understanding DataShare (DSpace + SWORD + Datashare) – Backend integration – METS (metadata encoding transaction standard) development – Interface decisions
  • 35. RSpace – Edinburgh Datashare integration: Backend platform – Edinburgh DataShare has three interfaces/APIs • Web-UI • Python • SWORD (simple Java based web-service which supports repository deposits) – RSpace uses the SWORD Interface – The SWORD server accepts a file for deposition if a METS description file is provided
  • 36. Four part METS implementation in RSpace – Datashare integration • RSpace uses the standard METS header • DMD -- field definitions are based on Dublin Core – Four required fields in Edinburgh DataShare -- contributor, publisher, title, and data creator -- must be completed as part of the deposit through RSpace – Additional optional fields can be filled in later by DataShare administrator: • FUNDER, SPATIAL_COVERAGE, TIME_PERIOD, DATA_CREATOR, AVAILABLE_DATE, DESCRIPTION_ABSTRACT, DESCRIPTION_TOC, LANGUAGE, RELATION_VERSION_OF, RELATION_REFERENCED_BY, SUPERCEDES, RIGHT, SOURCE, SUBJECT_KEYWORDS, SUBJECT_CLASSIFICATION, ALTERNATIVE_TITLE • All zipped files and their mime-types (e.g. application/pdf, text/html) are included • A structure map describes the full structure and relationships between the above three elements
  • 37. RSpace – Edinburgh Datashare integration: Workflow • Front end trigger – An RSpace user selects files/folders/notebooks to be deposited from RSpace, and starts the deposit process • Backend to support the user workflow – RSpace extracts the associated data and resources from its database and file-store – These are turned into xml files – METS is used to describe the zip file and each selected file – The xml, resource, and METS files are zipped into a zip file for archiving – The DSpace SWORD client deposits the zip file to DataShare after an authentication and validation procedure – File deposited in Collection associated with Depositor
  • 38. User workflow 1 My records in RSpace
  • 41. User workflow 2c Selecting record(s) for export
  • 42. User workflow 2d Selecting record(s) for export
  • 44. User workflow 4 Completed deposit dialog
  • 46. User workflow 6 DataShare agreement message
  • 47. User workflow 7 DataShare agreement email
  • 48. User workflow 8 Submission approval email
  • 49. User workflow 9 Submission screen in DataShare
  • 50. Looking ahead 1: Integrating ELNs with other digital resources – Other institutional repositories – Domain specific repositories • PubChem, ChemSpider – Archives • Institutional • Commercial – Chemical Semantics – CDD Vault
  • 51. Future workflow 1 ELN Enter data and document research Data repositories Store data and metadata Publications Publish data and results
  • 52. Looking ahead 2: ELN as interactive publishing platform Institutional repositories Datasets (Publications) Archives Domain repositories ELN Data Metadata
  • 53. ELN transformed from point tool to infrastructural glue at the heart of the research process
  • 54. ELN also becomes the backbone of data management at the University Integration with – Data storage – Data repository – Data archive
  • 55. Scientific data: whose problem? Funders Communities Researchers PIs/Labs Institutions
  • 56. Which model(s) will win? Those that best satisfy the core requirements of the key constituencies • Easy data entry and usability – Researchers, Labs/PIs • Easy integration with write up/publication – Researchers, Labs/PIs • Enhanced collaboration and better management of data and research – PIs/labs, Institutions • Comprehensive and easy data capture and availability for reuse – Institutions/Communities • Flexibility to support different workflows and outputs – Researchers, Labs/PIs, Communities • And by the way, as Wisconsin said: It has to be affordable!