British Library Social Science National Postgraduate Training Day - Datasets ...
ODIN Project Presentation to CLOSER Leadership Team
1. ODIN –
ORCID and DATACITE Interoperability Network
Presentation to CLOSER Leadership Team
November 2012
John Kaye – British Library
www.slideshare.net/johnkayebl www.odin-project.eu
Funded by The European Union Seventh Framework
Programme
2. Overview
• Overview
• Project Structure
• Humanities and Social Science Proof of
Concept
• High Energy Physics Proof of Concept
• Results
• Commonalties
• Risks
3. Overview
• 2 year project funded under EC FP7 Coordination and Action Programme
• ORCID (Open Researcher and Contributor ID Initiative)
• Datacite Consortium – BL is UK registration agent
• Partners: ORCID, Datacite, BL, CERN, Dryad, arXiv, ANDS
• Build on ORCID and Datacite initiatives to uniquely identify and connect
scientists and datasets
• ‘Datasets’ has a broad definition (anything but journals) so can include grey
literature, presentations, code etc.
• Connect information across multiple services and infrastructures for
scholarly communications
4. Overview
• Infrastructure already exists for researchers to build up an open
portfolio of research objects
• Register an ORCID ID www.orcid.org and link published papers
using ORCID’s tools
• Non published outputs (working papers, datasets) can be deposited
in figshare http://figshare.com/ given a DataCite DOI and linked back
and added to ORCID profile
• ODIN wants to expand on this principle and engage with data
centres and institutional repositories to allow easier more open
discovery of non-traditional research outputs.
6. Proofs of Concept Objectives
• Develop two disciplinary proofs of the concept of open and interoperable
persistent identifiers of data and contributors in scholarly communication, in
a variety of current and future scenarios.
Specific goals:
• Prove the ability to navigate across data and contributors in the Humanities and Social
Sciences (HSS) where data and contributors are separated in space and time, with curators
bridging the gap;
• Prove the ability to navigate across data and contributors in High-Energy Physics (HEP),
where multiple version of articles in preliminary and final form, with several thousand
contributors, need to be associated with a correspondent dataset hosted in different
systems
• Identify, by a critical analysis of the proofs of concept, common issues in open and
interoperable permanent identifiers of data and contributors, by establishing a common
cross-disciplinary view on the relevant workflows
7. Deliverables and Time frames
• D3.1 HSS Proof of Concept – Aug 2013
• D3.2 HEP Proof of Concept – Aug 2013
• D3.3 Commonalities – Sept 2014
• MS5 Commonalities Identified Jan 2014
• D3.1 and D3.2 Validated by the community at 1st year
event
• Input from ANDS and arXiv
9. HSS: Birth Cohort Studies
• Why Birth Cohort Studies?
• Investment
• Established/Long history
• Tradition of data curation
• High Re-use
• Derived Data
• Multi-disciplinary
• BL Involvement in CLOSER (Cohort and Longitudinal
Studies Enhancement Resource)
10. HSS: Current Status
• HSS British Birth Cohort characteristics:
• High re-use of data
• Data analysed across cohorts (e.g. 1958 questions alongside 2000)
• Derived data often kept outside original repository
• Lots of ‘grey literature’ (working papers, pre-prints etc.)
• Different publication spaces (publishers, institutional repositories)
• Challenges:
• Uniquely associate articles/datasets with authors/contributors from a range of
data sources
• Authors/creators/researchers go back a long way (could be as early as 1946)
• How to deal with non-digital research outputs
• How to deal with cross-cohort analysis (multiple datasets, derived datasets)
• Associate datasets with articles and track impact of data re-use
• Survey questions often more important to identify than actual survey (survey
contains thousands of variables)
11. HSS: Objectives
• Indentify workflows and develop conceptual model
• Provide technical solutions for Identifying and connecting data creators,
authors, researchers, contributors and research objects related to British
Birth Cohort Studies
• Identify, use and link existing identifiers and data sources where possible
• Identify deficiencies in identification or relationship data and develop or
propose solutions
• Work with the research community to develop user case studies and data
collection and enhancement
• Create an open and interoperable network linking people and research
objects to allow Impact Tracking and Resource Discovery
12. HSS Proof of Concept
Data Creator,
Researcher, Author
Birth Cohort Study
dataset
Non- Birth Cohort
Study dataset
Derived dataset
Grey Literature
1958
1958 Published article
Citation
Data Creator
Derived Data
Creator
External Data input
Author: Grey lit
External Data
External Data
(Census,
(Census, Author: Article
))
Health etc
Health etc
1970
1970
13. HSS Proof of Concept
Data Creator,
Researcher, Author
Birth Cohort Study
dataset
Non- Birth Cohort
Study dataset
Derived dataset
Grey Literature
1958
1958 Published article
Citation
Data Creator
Derived Data
Creator
External Data input
Author: Grey lit
External Data
External Data
(Census,
(Census, Author: Article
))
Health etc
Health etc
1970
1970
14. HSS Proof of Concept
Data Creator,
Researcher, Author
Birth Cohort Study
dataset
Non- Birth Cohort
Study dataset
Derived dataset
Grey Literature
1958
1958 Published article
Citation
Data Creator
Derived Data
Creator
External Data input
Author: Grey lit
External Data
External Data
(Census,
(Census, Author: Artticle
))
Health etc
Health etc
1970
1970
15. HSS Proof of Concept
Data Creator,
Researcher, Author
Birth Cohort Study
dataset
Non- Birth Cohort
Study dataset
Derived dataset
Grey Literature
1958
1958 Published article
Citation
Data Creator
Derived Data
Creator
External Data input
Author: Grey lit
External Data
External Data
(Census,
(Census, Author: Article
))
Health etc
Health etc
1970
1970
16. HSS: Identifiers and
Data Sources
Researchers etc.: ORCID, ISNI, JISC Names, SCOPUS, Surveys, Citation DB’s,
UK Data Service, Catalogue metadata
Source Datasets: DataCite DOIs, ESDS
Derived Data: DataCite DOIs, Institutional ID’s, No ID’s, ESDS, Surveys, Institutional
Repositories
‘External’ Data: DataCite DOIs, Institutional ID’s, No ID’s, ESDS, Other datacentres,
NHS, Institutional etc.
Grey Literature: DataCite DOIs, Institutional ID’s, No ID’s, Surveys, ESDS,
Institutions
Published Literature: CrossRef DOIs, Institutional ID’s, No ID’s, SCOPUS Surveys,
ESDS, Institutions, Citation DB’s, Catalogue metadata
18. Current status (I)
HEP (High-Energy Physics) field specificities:
Multiversioning: from preprint versions until final publications
Hyperauthorship: hundreds/thousands of scientists signing the
same article
Data levels of abstraction (CERN, Inspire, HEPData)
Different publication spaces (arXiv, Inspire, publishers)
Challenges:
Author identification, improvement of the disambiguation
process done in place
Uniquely associate articles/datasets with authors/contributors
Version management during the long publication process
20. Current status (III)
Disambiguation process
among thousands of authors:
Names and affiliations
Different ways to write the
same information
Clustering algorithm
Current Inspire interface
21. Phase 2:
Results and Commonalities
• Results to feed into Hackathon event and strategy
• Assessment and validation by research community and international
partners
• BL and CERN come together to find commonalities in the disciplines to
inform WP4 (interoperability)
• This process will incorporate knowledge from the results of the
Hackathon as well as the conceptual model for global interoperability of
data and contributor identifiers developed in WP4
• This task will result in a more comprehensive view on disciplinary and
interdisciplinary needs, and will produce information, internally
transferred to the other work packages
22. Questions?
John Kaye – Lead Curator Digital Social Sciences
The British Library
96 Euston Road
London NW1 2DB
john.kaye@bl.uk
Twitter: @johnkayebl
Telephone: 020 7412 7450
Project Website http://odin-project.eu/
Blog: http://britishlibrary.typepad.co.uk/socialscience/