1. Research data: what can libraries do?
Zaven Akopov
Deutsches Elektronen-Synchrotron DESY
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 1
2. Contents
• Data preservation as an intrinsic part of Open Data
• HEP Data: challenges and specifics
• Requirements for documentation and long-time
storage
• Types of documents
• high-level (secondary) data
• Assignment of metadata and long-time storage in
Inspire
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 2
3. Data Preservation as an intrinsic part of Open Data
• Data management cycle in HEP
• data taking – initial storage – data processing –
storage of processed data (high level data) – physics
analysis (software) – publication of papers
(interpretation)
•
initially not planned for open access – but limited
lifetime of experiments …
•
providing infrastructure for data preservation
might be paving the path to sustainable (open)
access to data
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 3
4. HEP Research Data: Motivation
•
Accelerators, Detectors
•
Unique experimental data, usually not reproducable in
other labs
•
A lot of resources and investments to build detectors,
provide for manpower for data analysis
•
By the end of experimental data taking still substantial
amount of data not analyzed
•
The data can also be processed and analyzed using
eventually new methods and models which developed over
time; new approaches.
•
Bottomline: the HEP data should be made accessible and
be preserved
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 4
5. Efforts and models
• DPHEP Working group active since 2009:
www.dphep.org
• 4 “levels” of HEP Research data and its preservation
have been identified
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 5
6. Efforts and models
• Most collaborations and labs plan the “Level 4
preservation” (raw data)
• The requirements of “Level 1” and partially, of “Level 2”
can and should be fulfilled using a high-end
bibliographic system, with metadata assignment, etc.
• This is where the Library can help to close the gap in
the complete data preservation cycle:
• Identifiable data
• Focus data access: preservation is an integral part of
data management.
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 6
7. Specific cases
•
To provide the necessary know-how for re-use: not only
the (primary) data itself and analysis software needed,
but also the associated documentation related to data
taking and analysis (technical guides, internal notes,…)
•
… which provide basis for the corresponding
publications – but also substantially more additional
information, e.g.:
• Details of the data analysis methods (software,
simulation, …)
• Detectors and their components (pedestals,
operating parameters, …)
•
Need to preserve: secondary data (tables, root scripts,
codes, plots) - simplified Data from Level 2
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 7
8. Long-time storage
•
Example HERA (DESY):
•
By the end-of-running, storage on collaboration/IT
structures (servers run by the staff or IT)
•
Websites, AFS space, propietary structures, etc.
•
Lack of real bibliographic system (metadata,
complex search engine, …)
•
No consistent strategy and no sustainability:
these structures would not be preserved by the
DESY/IT
-> provide infrastructure
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 8
9. Why INSPIRE?
•
Ingestion of the technical documents and analysis
notes - possibility to interlink them with the actual
publications (based on…, superceded by…)
•
The documents are preserved and are not
dependent on the life expectancy of the specific
experiment and it‘s IT infrastructure
•
Many Inspire features like fulltext search, data
object citation, etc.
•
The secondary data (high-level data) provide
added value to the existing publications
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 9
10. INSPIRE Features
• Google-like speed for up to 2M records
• Combined search of Metadata, References,
Fulltext
• Scalability
• Flexible metadata (multimedia, secondary data)
• Personalisation (claim your paper)
• One-stop-shop for HEP information
• Fulltext repository
• Integration of research data
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 10
11. Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 11
12. Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 12
13. Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 13
14. Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 14
15. Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 15
16. Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 16
17. Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 17
18. Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 18
19. Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 19
20. Long-term access
• The documents (Notes) Access Control
• Working together with the labs to develop
effective access control strategies
(short/long-term)
• Simple user accounts are live
• Curator accounts live
• Further options
• Flexible user accounts (based on author
lists, external authentification SSO), e.g.
arXiv account
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 20
21. Summary
•
•
•
•
•
Collaboration of DESY Library (Inspire) with DPHEP
• HERMES, ZEUS, H1, ZEUS Experiments
• Internationally – DØ, CDF (Fermilab), BaBar
(Stanford)
First stage completed for all HERA experiments and
D0 (Fermilab): all of the internal documentation are
stored in Inspire
Second Stage comleted: Collaboration curator
accounts with modification rights are also live and a
success
Test phase: ZEUS and BaBar preliminary notes
harvested; the rest should follow;
High-level Data in Inspire: HEPData, Plots, etc. –
Inspire provides longtime preservation.
Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 21
22. Zaven Akopov | “Research Data: What can libraries do? | Helmholtz Open Access Workshop
June 11, 2013 | Page 22