Small Data: How Elsevier Might Help with Research Data Management

Small Data: How Elsevier Might Help
With Research Data Management

David Marques
27 February 2013

Research Data Symposium
Columbia University

Assertions

• We share a common goal: an open system of
ubiquitous sharing of research data in
repositories that are
– discipline-specific
– controlled-vocabulary annotated
– Normalized
• A very small portion of research data is being
shared to the discipline-specific repositories

2

Problem statement

• There are a lot of barriers to sharing of data
• There are problems with sustainable funding for
repositories

3

Points of this presentation

• We can help remove the barriers by
– applying rigorous yet efficient process
– using discipline-specific informatics skills
– providing credit assignment and assessment
– helping capture metadata early and digital

• It is possible, and we can help to create
sustainable funding models for open data
repositories

4

Big Data vs Research Data
Plan

Data life cycle taken from DataONE
Analyze Collect

Plan

Integrate 'Big Data' Emphasis Assure
Analyze Collect

Research Data Pain
Discover Describe
Integrate Assure

Preserve

Discover Describe

Preserve

5

Dataset Repositories: MANY solutions
• Figshare [http://figshare.com/] (Digital Science)
• GigaDB [http://gigadb.org/] (BioMed Central)
• DataDryad [http://datadryad.org/]
• Australian National Data Service [http://www.ands.org.au/]
– but: their goal is to move from

• Amazon’s Glacier [http://aws.amazon.com/glacier/]
6

Problem 1: Barriers to Data Disclosure and Sharing

• Non-digital Metadata • Open to mis-interpretation
• Different skill sets • Lack of credit
• Takes time and mindset • Intellectual property or
away from research possible patent issues
• Requires common • Easier contradiction
nomenclature • No incentive, little value to
• Cost the sharer
• It is a long-tail problem: • Privacy and security
thousands of narrow concerns
solutions provides the best
value

7

Are supplemental files the answer?
• Scope
– 15% of 2012 Elsevier articles had supplemental files
– ~ 1% have spreadsheets
– ~ 2% have either spreadsheets or zip files
• Extracting value
– no rules for supplemental files
– no common nomenclatures
– analytics, comparisons, trends are hard
• Elsevier recommends (and some journals such as Cell Press
journals require) that authors share/deposit data in
discipline repositories
• Linking helps ovecome the credit barrier
– Elsevier links articles to/from datasets in open repositories
– 35 today (including EarthChem)
– 10 more in progress 8

Problem 2: Sustainability

• Many are grant-funded initially, as research projects – and
funding bodies often do not intend to fund repositories long
term

• Can we fund from a Gold Open Access model?
• Can we fund from high-end analytics subscriptions?
• Can we fund some of them from health care and corporate
use?

13

PLAN
10% PROPOSE

SUPPORT SERVICES

25%

19%
ACQUISITION
15%
ACCESS submission agreement
STORAGE, data formats
searching and ordering DATA MANAGEMENT IP rules
user guides user documentation and
delivery of result sets and support
reports
6%

INGEST
25%
receive
QA and validation
transform
create metadata (taxonomies)
updates PRODUCE/
PUBLISH
reference linking MANAGE

Summary of data in: Keeping Research Data Safe2, Beagrie et al, 2010 funded by JISC 14

Pain Points and Elsevier Strengths and Expertise
• Taxonomies
– 50+ discipline-specific taxonomies – core to Elsevier
• At-scale, efficient, best-practices process
• At-scale analytics

• Turning freely-available data into high-value solutions for corporate use
without advertising (advertising models require very large customer groups)

• Impact analysis and reporting
15

Research Data Services – new group at Elsevier
• Goals
– Increase archiving and sharing of research data (as
requested by funding bodies)
– Increase the value of shared data (with metadata)
– Foster and assist with the credit and impact assessment of
research data for the researcher, the institution, and the funding
bodies
– Increase the sustainability of data repositories
• Principles
– Open data – all data remain open and available
– Collaborative – with institutions, the research community,
funding bodies
– Transparent business model – if we make money, some goes
back to fund the repositories
16

Pilot: see if we Research Data Management
can scale a Plan Pilot: collecting
repository and data with an
make it app, integrating

Data Management
financially and sharing with
sustainable Analyze Collect a dashboard

Plan
An
aly

c t nd
t
Do i c s

e
ru a
ur
st u s
m En
Pilot: user

fra B
ain gi
K nes

I n ata
LDR to ,

D
Pilot: collect and
connect standardize
Method Tools
data from
different
Integrate Linked Data
Repositories
RDM (VizTrails)
Assure
method and
provenance
repositories IEDA/EarthC
s, T
ie ries B e ax ube
to create o m to st ono
Repositories, Data

x on irec Pr m
ac ie collaboration
insight
Mgmt Plans

Ta , D tic s,
O es with Kerstin
SE
Discover Describe
Lehnert.
Pilot: annotate
Pilot: create data and
directories to methods with
help discover standard
Preserve
data in shared taxonomies
repositories 17

Disclosure Pilot Benefits for the Researcher

• Immediate visibility and overview of the research (PI
Dashboard)
• Enhanced discoverability of research data attributable to the
university and the research team
• Credit/impact for the university, the research team, and the
funding bodies
• Acknowledgement by the funding bodies of the
disclosure/sharing of the data
• [better, faster science]

18

Disclosure Pilot Benefits for the Institution

• Increased rigor of data management
– consistency
– best practices
– overview metadata in research management information systems
• Step toward completeness of research data management
• Compliance to funding body requirements, stronger base
from which to request
• Increased visibility, discoverability, credit

19

Disclosure Pilot Benefits for the Funding Body

• Increased data disclosure and sharing
• Increased discoverability of data (with funding body credit)
• Increased opportunity for ‘fourth paradigm’ (analytics-
derived) science – better, faster science
• Credit/impact for sponsored research
• Standardization and best practices in data management plans
and actual data curation/preservation

20

Research funding
Today’s funding models Data mgmt (Gold OA)
FREE
License or subs.

21

Research funding
Increasingly common models Data mgmt (Gold OA)
FREE
License or subs.

Translational
Medicine
Analytics

22

Research funding
Working together, we could do this Data mgmt (Gold OA)
FREE
License or subs.

Task-specific
Analytics

23

An interesting quote at the IDCC13 cost workshop

[loosely quoted, I did not catch it verbatim]

We can’t do this by ourselves. We should get someone with
business savvy to partner with us.

24

Small Data: How Elsevier Might Help with Research Data Management

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Small Data: How Elsevier Might Help with Research Data Management

Semelhante a Small Data: How Elsevier Might Help with Research Data Management (20)

Mais de Elsevier

Mais de Elsevier (20)

Small Data: How Elsevier Might Help with Research Data Management

Notas do Editor