This document summarizes a presentation on research data metrics from the NISO Altmetrics Working Group B. It discusses various metrics for research data, including citations of datasets and metadata, full-text search of datasets, downloads, and usage statistics. It also describes projects from DataCite and the Making Data Count initiative that are working to develop standard metrics for research data and make them available via APIs. Future work discussed includes analyzing networks of linked datasets and second-order citations.
NISO Working Group Connection LIVE! Research Data Metrics Landscape
1. NISO Working Group Connection LIVE!
Research Data Metrics Landscape:
An update from the NISO Altmetrics Working Group B: Output Types &
Identifiers
Monday, November 16, 2015
Presenters:
Kristi Holmes, PhD, Director, Galter Health Sciences Library, Northwestern University
Mike Taylor, Senior Product Manager, Informetrics, Elsevier
Philippe Rocca-Serra, Ph.D., Technical Project Leader, Oxford
Tom Demeranville, THOR Senior Project Officer & ORCiD Software Engineer
Martin Fenner, Technical Director, DataCite
Dr. Sarah Callaghan, Senior Researcher and Project Manager, British Atmospheric Data Centre
Dr. Melissa Haendel, Associate Professor, Ontology Development Group, OHSU Library, Dept of
Medical Informatics and Clinical Epidemiology, Oregon Health & Science University
http://www.niso.org/news/events/2015/wg_connections_live/altmetrics_wgb/
13. Goals
What metrics for research data do
researchers and datamanagers want?
Dodata repositories make these
metrics available?
If not, build services to collect these
metrics for DataONErepository
network
14. How interested would yoube to know eachof
the following about the impact of yourdata?
http://doi.org/10.1038/sdata.2015.39
http://www.dx.doi.org/10.5060/D8H59D
18. Metadata ofarticles
References are part of the
metadata deposited to CrossRef
Cited-by service aggregates these
citations for CrossRefDOIs
Work is underway to exchange DOI<->
DOI linksbetween CrossRef and DataCite
26. FutureWork
• Collect data citations from CrossRef
• Analyze usage statistics in more detail and
provide input to COUNTERand NISO
• Analyzenetwork graph, e.g.linked datasets
and second ordercitations
• Turn researchproject into service, including
integration of client applications for search and
reporting
27. Introducing the
Metadata Model v1
Philippe Rocca-Serra PhD,
University of Oxford e-Research Centre
on behalf of WG3 Metadata WG
Supported by the NIH grant 1U24 AI117966-01 to the University of
28. A trans-NIH funding initiative established
to enable biomedical research as a digital
research enterprise
• Facilitate broad use of biomedical digital assets by making them discoverable,
accessible, and citable ->
• Conduct research and develop the methods, software, and tools needed to
analyze biomedical Big Data ->
catalog to enable researchers to find, cite research datasets
ease the use community standards to annotate datasets
29. Lucila Ohno-Machado (PI)
Jeff Grethe
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
30. Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
31. Pilot applications that ‘dock’ with the prototype and
community-driven activities via Working Groups:
1. BD2K Centers of Excellence Collaboration
2. Data Identifiers Recommendation
3. Metadata Specifications
4. Use Cases and Testing Benchmarks
5. Dataset Citation Metrics
6. Criteria for Being Included in the DDI
7. Machine Actionable Licenses
8. Ranking Algorithm
9. End User Evaluation Criteria
10. Repository Collaboration
11. Outreach Meeting: Repository Operators
12. Standard-driven Curation Best Practices
13. Evaluation of Harvesting and NLP Pilot Projects
All this by
August 2017!
32. Joint effort with BD2K Center for Expanded
Data Annotation and Retrieval (CEDAR)
Synergies with BD2K cross-centers
Metadata WG (co-chaired by M
Musen/CEDAR, G Alter/bioCADDIE) and
ELIXIR activities
33. WG3 Metadata - Goals
Define a set of metadata specifications that support intended
capability of the Data Discovery Index prototype - being designed by
the bioCADDIE Core Development Team - as outlined in the White
Paper
Core metadata, designed to be future-proofed for progressive
extensions (phase 1: May-July 2015)
Followed by test and implementation phase
Domain specific metadata for more specialized data types (phase 2)
Use cases and the competency questions have been used
throughout the process
To define the appropriate boundaries and level of granularity:
which queries will be answered in full, which only partially, and
which are out of scope
Supported by the NIH grant 1U24 AI117966-01 to the University of
34. Supported by the NIH grant 1U24 AI117966-01 to the University of
WG3 Metadata – work to date
with contributions, comments from several WG 3 members and colleagues,
in particular: Joan Starr, George Alter, Ian Fore, Kevin Read, Stian Soyland-
Reyes, Muhammad Amith, Michel Dumontier…
By:
35. Contains lists of material reviewed
• data discovery initiatives and metadata initiatives
• existing meta-models for representing metadata elements
Outlines the approach used to identify metadata descriptors
• Via use cases and competency questions (top-down
approach)
• Mapping generic and life science-specific metadata
schemas (bottom-up approach)
Listed in the BioSharing collection for bioCADDIE
The results of both approaches has been compared and
converged on the core set of metadata
Supported by the NIH grant 1U24 AI117966-01 to the University of
Standard Operating Procedure (SOP)
36.
37. List of Metadata Schema considered
• schema.org
• datacite
• hcls dataset descriptors
• biosample
• geo miniml
• prideml
• isatab/magetab
• ga4gh metadata schema
• sra xml
• bioproject
• cdisc sdm / element of bridge model
Supported by the NIH grant 1U24 AI117966-01 to the University of
39. Selected competency questions
representative set from use cases workshop, white paper, submitted by the
community and from Phil Bourne
questions have been abstracted and key metadata elements have been
highlighted and color-coded and categorized
as the set of core and extended metadata elements are defined, it will
become clearer which questions the Data Discovery Index will not be able
to answers if full and which only in part.
Supported by the NIH grant 1U24 AI117966-01 to the University of
Use Cases and Derived Metadata
40. Selected competency questions
representative set from use cases workshop, white paper, submitted by the
community and from Phil Bourne
questions have been abstracted and key metadata elements have been
highlighted and color-coded and categorized
as the set of core and extended metadata elements are defined, it will
become clearer which questions the Data Discovery Index will not be able
to answers if full and which only in part.
Supported by the NIH grant 1U24 AI117966-01 to the University of
Use Cases and Derived Metadata
41. Processing use cases
Supported by the NIH grant 1U24 AI117966-01 to the University of
All use cases on equal
footing
Term Binning
Material
Process
Information
Property
Relation identification
42. Core metadata elements and initial model
the result of the combined approaches has delivered a set of core metadata
elements and progressively these will/could be extended to domain specific
ones, in phase two, as needed
we aim to have maximum coverage of use cases with minimal number of
data elements, but we do foresee that not all questions can be answered in
full
Initial Set of Metadata Elements
44. Everything is on github
Supported by the NIH grant 1U24 AI117966-01 to the University of
45. Formal specifications
metadata schema in JSON
• https://github.com/biocaddie/WG3-
MetadataSpecifications/tree/master/json-schemas
Supported by the NIH grant 1U24 AI117966-01 to the University of
46. What’s next ?
With this work phase 1 has been completed
We have entered the evaluation phase
the model will be implemented and tested by the
bioCADDIE Development Team with a number of data
sources
the results will inform the activities in phase 2, where
the metadata elements and the model may be
revised, simplified and/or enriched, as needed
Supported by the NIH grant 1U24 AI117966-01 to the University of
47. Take Home Message
• primary goal: provide a general purpose metadata
schema allow harvesting of key experimental and
data descriptors from a variety of resources and
enable indexing to support data discovery
– relations between authors, datasets, publication
and funding sources
– nature of biological signal, nature of perturbation,
Supported by the NIH grant 1U24 AI117966-01 to the University of
48. Outstanding issues
• prioritizing the use cases
• defining mechanisms to deal with domain specific,
granular data
• moving into phase2 and devising data ingesters
– ETL activities
– interact with other modeling efforts
• incorporate feedback from users and developers
Supported by the NIH grant 1U24 AI117966-01 to the University of
50. orcid.orContact Info: p. +1-301-922-9062 a. 10411 Motor City Drive, Suite 750, Bethesda, MD
20817 USA
ORCID, Metrics
andProjectTHOR
Tom Demeranville
SeniorTechnical Officier – Project
THOR
NISOWebinar, November 2015
Start Here
51. What isORCID?
orcid.o16 November
2015
5
5
ORCID is an infrastructure that provides unique Person
Identifiers. ORCID is a hub for linking identifiers for people
with their activities. ORCID is researcher centric with 1.7
million registered identifiers.
ORCID records are managed by the researcher themselves.
ORCID is open source, community governed and non-profit.
ORCID has a public API that allows querying of non-private
data. ORCID has a member API that enables updating and
notifications. ORCID IDs are associated with over 4 million
unique DOIs
52. 347 members, 4 national
consortia,
over 200 integrations
researc
h
inst
68%
publishe
r
12%
funde
r
5%
9
%
associatio
n 6%
repository
ME
A
3%
orcid.o16 November
2015
5
6
Europ
e
58%
Latin
Americ
a 1%
North
Americ
a 26%
Pacifi
c
7%
Asi
a
5%
53. What ORCID isn’t
orcid.o16 November
2015
5
7
ORCID is not a CRIS system
ORCID is not a researcher profile system
ORCID is not a research activity metadata
store
54. Research outputs
orcid.o16 November
2015
5
8
• ORCID includes links to
publications, patents, datasets,
software and more.
• ORCID uses the CASRAI Output
vocabulary for work types
• ORCID references over 20 other output
identifiers (more are being added!)
57. ORCID andMetrics
orcid.o16 November
2015
6
1
• ORCID improves the quality of research
information and makes gathering it and
disseminating it easier.
• Other services use ORCID IDs to improve their
data
• ORCID IDs are found in DOI metadata, funder
systems, publishers, CRIS systems, national
reporting frameworks and more
• Institutions can discover researcher curated
standard and non-standard outputs or be
notified when added
58. Project THOR
http://project-thor.eu
EC funded H2020 2.5 year project
Establish seamless integration between articles, data,
and researchers across the research lifecycle
Make persistent identifier use for people and research
artefacts the default
Both human and technical in scope
61. What THOR are up to
http://project-thor.eu
Research - Deciding what needs to be done
Integration - Doing what needs to be done
Outreach - Getting others involved
Sustainability - Making sure it lasts
63. Organisation identifiers
http://project-thor.eu
Community driven consensus on requirements is
needed.
We need a way forward.
THOR will help by convening meetings with all
interested parties in the community, including research
institutions, funders, datacentres, publishers,
standards bodies, existing organisation identifier and
other identifier providers.
65. orcid.orContact Info: p. +1-301-922-9062 a. 10411 Motor City Drive, Suite 750, Bethesda, MD
20817 USA
ORCID, Metrics
andProjectTHOR
Tom Demeranville
SeniorTechnical Officier – Project
THOR
NISOWebinar, November 2015
Start Here
66. What isORCID?
orcid.o16 November
2015
7
0
ORCID is an infrastructure that provides unique Person
Identifiers. ORCID is a hub for linking identifiers for people
with their activities. ORCID is researcher centric with 1.7
million registered identifiers.
ORCID records are managed by the researcher themselves.
ORCID is open source, community governed and non-profit.
ORCID has a public API that allows querying of non-private
data. ORCID has a member API that enables updating and
notifications. ORCID IDs are associated with over 4 million
unique DOIs
67. 347 members, 4 national
consortia,
over 200 integrations
researc
h
inst
68%
publishe
r
12%
funde
r
5%
9
%
associatio
n 6%
repository
ME
A
3%
orcid.o16 November
2015
7
1
Europ
e
58%
Latin
Americ
a 1%
North
Americ
a 26%
Pacifi
c
7%
Asi
a
5%
68. What ORCID isn’t
orcid.o16 November
2015
7
2
ORCID is not a CRIS system
ORCID is not a researcher profile system
ORCID is not a research activity metadata
store
69. Research outputs
orcid.o16 November
2015
7
3
• ORCID includes links to
publications, patents, datasets,
software and more.
• ORCID uses the CASRAI Output
vocabulary for work types
• ORCID references over 20 other output
identifiers (more are being added!)
72. ORCID andMetrics
orcid.o16 November
2015
7
6
• ORCID improves the quality of research
information and makes gathering it and
disseminating it easier.
• Other services use ORCID IDs to improve their
data
• ORCID IDs are found in DOI metadata, funder
systems, publishers, CRIS systems, national
reporting frameworks and more
• Institutions can discover researcher curated
standard and non-standard outputs or be
notified when added
73. Project THOR
http://project-thor.eu
EC funded H2020 2.5 year project
Establish seamless integration between articles, data,
and researchers across the research lifecycle
Make persistent identifier use for people and research
artefacts the default
Both human and technical in scope
76. What THOR are up to
http://project-thor.eu
Research - Deciding what needs to be done
Integration - Doing what needs to be done
Outreach - Getting others involved
Sustainability - Making sure it lasts
78. Organisation identifiers
http://project-thor.eu
Community driven consensus on requirements is
needed.
We need a way forward.
THOR will help by convening meetings with all
interested parties in the community, including research
institutions, funders, datacentres, publishers,
standards bodies, existing organisation identifier and
other identifier providers.
80. VO Sandpit, November 2009
Bibliometrics for Data – what
counts and what doesn’t?
Sarah Callaghan
sarah.callaghan@stfc.ac.uk
@sorcha_ni
NISO Working Group Connections LIVE!
Research Data Metrics Landscape:
An update from the NISO Altmetrics Working Group B: Output Types &
Identifiers
Monday, November 16 from 11:00 a.m. - 1:00 p.m. (ET)
81. VO Sandpit, November 2009
The UK’s Natural Environment Research Council (NERC)
funds six data centres which between them have
responsibility for the long-term management of NERC's
environmental data holdings.
We deal with a variety of environmental measurements,
along with the results of model simulations in:
•Atmospheric science
•Earth sciences
•Earth observation
•Marine Science
•Polar Science
•Terrestrial & freshwater science, Hydrology and
Bioinformatics
•Space Weather
Who are we and why do we
care about data?
82. VO Sandpit, November 2009
Data, Reproducibility and Science
Science should be reproducible –
other people doing the same
experiments in the same way
should get the same results.
Observational data is not
reproducible (unless you have a
time machine!)
Therefore we need to have access
to the data to confirm the science is
valid! http://www.flickr.com/photos/31333486@N00/1893012324/sizes/
o/in/photostream/
83. VO Sandpit, November 2009
It used to be “easy”…
Suber cells and mimosa leaves. Robert
Hooke, Micrographia, 1665
The Scientific Papers of William Parsons,
Third Earl of Rosse 1800-1867
…but datasets have gotten so big, it’s not
useful to publish them in hard copy anymore
84. VO Sandpit, November 2009
Hard copy of the Human Genome at
the Wellcome Collection
85. VO Sandpit, November 2009
Creating a dataset is hard
work!
"Piled Higher and Deeper" by Jorge Cham
www.phdcomics.com
Managing and archiving data so that it’s understandable by other
researchers is difficult and time consuming too.
We want to reward researchers for putting that effort in!
90. VO Sandpit, November 2009
Some examples of data (just from
the Earth Sciences)
1. Time series, some still being updated
e.g. meteorological measurements
2. Large 4D synthesised datasets, e.g.
Climate, Oceanographic, Hydrological
and Numerical Weather Prediction
model data generated on a
supercomputer
3. 2D scans e.g. satellite data, weather
radar data
4. 2D snapshots, e.g. cloud camera
5. Traces through a changing medium,
e.g. radiosonde launches, aircraft
flights, ocean salinity and temperature
6. Datasets consisting of data from
multiple instruments as part of the
same measurement campaign
7. Physical samples, e.g. fossils
91. VO Sandpit, November 2009
What is a Dataset?
DataCite’s definition
(http://www.datacite.org/sites/default/files/Bu
siness_Models_Principles_v1.0.pdf):
Dataset: "Recorded information, regardless of
the form or medium on which it may be
recorded including writings, films, sound
recordings, pictorial reproductions,
drawings, designs, or other graphic
representations, procedural manuals, forms,
diagrams, work flow, charts, equipment
descriptions, data files, data processing or
computer programs (software), statistical
records, and other research data."
(from the U.S. National Institutes of Health (NIH)
Grants Policy Statement via DataCite's Best
Practice Guide for Data Citation).
In my opinion a dataset is
something that is:
•The result of a defined process
•Scientifically meaningful
•Well-defined (i.e. clear
definition of what is in the
dataset and what isn’t)
93. VO Sandpit, November 2009
Metric Breakdown
CEDA
numbers
Notes
Number of
discovery
dataset
records in the
DCS
Quarterly NEODC 26
BADC 242
UKSSDC 11
Compliance with NERC data management
policy. Reflects how many data sets NERC
has. The number of dataset discovery
records visible from the NERC data
discovery service.
Web site visits Quarterly BADC:
61,600
NEODC:
10,200
Active use and visibility of the data centre.
Site visits from standard web log analysis
systems, such as webaliser. Sensible web
crawler filters should have been applied.
Web site page
views
Quarterly BADC:
219,900
NEODC:
25,800
See web visits notes.
Queries
closed this
period
Quarterly 362 helpdesk
queries
838 dataset
applications
Active use and visibility of the data centre.
Queries marked as resolved within the
quarter. A query is a request for information,
a problem or ad hoc data request.
Queries
received in
period
Quarterly 388 helpdesk
queries
860 dataset
applications
Active use and visibility of the data centre.
See closed query notes.
Data
centre
metrics –
produced
15th July
2014
94. VO Sandpit, November 2009
Metric Breakdown CEDA numbers Notes
Percent queries
dealt with in 3
working days
Quarterly 84.06 (11.57% resolved after 3
days)
87.67 (10.23% resolved after 3
days)
Queries receiving initial response
within 1 working day
Helpdesk - 93.57 %
Dataset applications - 97.91%
Responsiveness. See
closed query notes
Identifiable users
actively
downloading
None Over year to date: BADC: 4065
NEODC: 362
Use and visibility of the
data centre. An estimate of
the number of users using
data access services over
the year.
Number of
metadata
records in data
centre web site
None BADC: 240
NEODC:33
INSPIRE compliance.
Reflects how many data
sets NERC has.
Number of
datasets
available to view
via the data
centre web site
None (Metric in development) INSPIRE compliance.
Usable services.
Number of
datasets
available to
download via the
data centre web
site
None (Metric in development) INSPIRE compliance.
Usable services.
Data
centre
metrics –
produced
15th July
2014
95. VO Sandpit, November 2009
Metric Breakdown CEDA numbers Notes
NERC funded Data centre
staff (FTE)
None 14 (estimate for FY
14/15)
Data management costs. Efficiency.
Number of full time equivalent posts
employed to perform data centre
functions.
Direct costs of Data
Stewardship in data centre
None (reportable at end of
financial year)
Data management costs. Efficiency. Cost
to NERC
Capital Expenditure directly
related to Data Stewardship
at data centre
None (reportable at end
financial year)
Data management costs. Efficiency.
Direct Receipts from Data
Licenses and Sales
None £0
(CEDA does not
charge for data)
Commercial value of data products and
services
Number of projects with
Outline Data Management
Plans
None (Metric in
development)
Means of tracking projects’ adoption of
good DM practice. Outline DMP is at
proposal stage
Number of projects with
Full Data Management
Plans
None (Metric in
development)
Means of tracking projects’ adoption of
good DM practice. Full DMP is at funded
stage
Users by area UK 2534 61% Active use. Visibility of the data centre
internationally. Percentage of user base
in terms of geographical spread.
Europe 494 12%
Rest of the
world
1024 25%
Unknown 79 2%
Users by institute type University 2934 71% Active use. Visibility of the data centre
sectorially. Percentage of users base in
terms of the users host institute type.
Government 694 17%
NERC 160 4%
Other 277 7%
Commercial 42 1%
School 35 1%
96. VO Sandpit, November 2009
Short answer:
We don’t know!!
Unless the data user comes back to us to tell us.
Or we stumble across a paper which
•Cites us
•Or mentions us in a way that we can find
• And tells us what the dataset the
authors used was.
This is why we’re working with other groups (like
CODATA, Force11, RDA, DataCite, Thompson
Reuters,…) to promote data citation.
After the data is downloaded,
what happens then?
97. VO Sandpit, November 2009
How we (NERC) cite
data
We using digital object identifiers (DOIs)
as part of our dataset citation
because:
• They are actionable, interoperable,
persistent links for (digital) objects
• Scientists are already used to citing
papers using DOIs (and they trust
them)
• Academic journal publishers are
starting to require datasets be cited in
a stable way, i.e. using DOIs.
• We have a good working relationship
with the British Library and DataCite
NERC’s guidance on citing data and assigning DOIs can be found at:
http://www.nerc.ac.uk/research/sites/data/doi.asp
98. VO Sandpit, November 2009
Dataset
catalogue page
(and DOI landing
page)
Dataset citation
Clickable link to
Dataset in the archive
101. VO Sandpit, November 2009
Data metrics – the state of the art!
Data citation isn’t common practice
(unfortunately)
Data citation counts don’t exist yet
To count how often BADC data is used
we have to:
1. Search Google Scholar for “BADC”,
“British Atmospheric Data Centre”
2. Scan the results and weed out false
positives
3. Read the papers to figure out what
datasets the authors are talking
about (if we can)
4. Count the mentions and citations (if
any)
http://www.lol-cat.org/little-lovely-lolcat-and-big-work/
We’re working with DataCite and
Thompson Reuters to get data
citation counts.
102. VO Sandpit, November 2009
Altmetrics and social media for data?
Mainly focussing on citation as a first
step, as it’s most commonly
accepted by researchers.
We have a social media presence
@CEDAnews
- Mainly used for announcements about
service availability
We definitely want ways of showing our
funders that we provide a good
service to our users and the research
community.
And we want to be able to
tell our depositors what
impact their data has had!
103. VO Sandpit, November 2009
RDA/WDS WG Bibliometrics Survey
Results: Mostly Expected
Citations are preferred metrics,
downloads next.
Standards are missing.
Culture change is needed.
0 10 20 30 40 50 60 70
Nothing
Data citation counts
Downloads
Social media (likes/shares/tweets)
Mentions in peer-reviewed papers
Hits in search engines
Mentions in blogs
Bookmarks in Zotero and/or Mendeley
Other (please specify)
31.5%
68.5%
Are the methods you use to evaluate impact
adequate for your needs?
Yes
No
What do you currently use
to evaluate the impact of
data?
104. VO Sandpit, November 2009
Other projects in the data metrics
space
1. CASRAI data level metrics
2. PLOS Making Data Count
3. NISO altmetrics
4. Jisc Giving Researchers Credit for their Data
105. VO Sandpit, November 2009
Next steps for Bibliometrics for
Data WG
Will be based on:
• WG survey results (presented RDA P4 and P5)
• Spreadsheet of metrics being collected by repositories - Still open
for contributions! http://bit.ly/1MpyW4K
• Shared results from other projects – understanding the challenges
and answering the questions posed in the case statement
• Preliminary analysis of data DOI resolutions
• Supporting and evaluating tools from other projects
• Preliminary guidance for the community - “minimal” rather than
“best” practice – get people discussing the issues and coming up
with solutions!
106. VO Sandpit, November 2009
Thanks!
Any questions?
sarah.callaghan@stfc.ac.uk
@sorcha_ni
http://citingbytes.blogspot.co.uk/
Image credit: Borepatch http://borepatch.blogspot.com/2010/06/its-
not-what-you-dont-know-that-hurts.html
“Publishing research without data is simply
advertising, not science” - Graham Steel
http://blog.okfn.org/2013/09/03/publishing-research-without-data-is-simply-advertising-not-science/
107. VO Sandpit, November 2009
Title: Getting (and giving) credit for all that we do
Melissa Haendel
NISO Research Data Metrics Landscape:
An update from the NISO Altmetrics Working Group B:
Output Types & Identifiers
11.16.2015
@ontowonka
114. VO Sandpit, November 2009
Many contributions don’t lead to authorship
BD2K co-authorship
D.Eichmann
N.Vasilevsky
20% key personnel are not adequately profiled using publications
115. VO Sandpit, November 2009
Some contributions are anonymous
Data deposition
Image credit: http://disruptiveviews.com/is-your-data-anonymous-or-just-encrypted/
Anonymous review
116. VO Sandpit, November 2009
The Research Life Cycle
EXPERIMENT
CONSULT
PUBLISHDATA
FUND
117. VO Sandpit, November 2009
The Research Life Cycle
EXPERIMENT
CONSULT
PUBLISHDATA
FUND
118. VO Sandpit, November 2009
• Measurement instruments
• Continuing education materials
• Cost-effective intervention
• Change in delivery of healthcare services
• Quality measure guidelines
• Gray literature
Evidence of meaningful impact
• New experimental methods, data
models, databases, software tools
• New diagnostic criteria
• New standards of care
• Biological materials, animal models
• Consent documents
• Clinical/practice guidelines
https://becker.wustl.edu/impact-assessment
http://nucats.northwestern.edu/
Diverse outputs
Diverse impacts
Diverse roles
Each a critical component of the
research process
119. VO Sandpit, November 2009
EXAMPLE OUTPUTS related to software:
Outputs: binary redistribution package (installer), algorithm, data analytic software tool,
analysis scripts, data cleaning, APIs, codebook (for content analysis), source code,
software to make metadata for libraries archives and museums, data analytic software
tool, source code, program codes (for modeling), commentary in code(thinking of open
source-need to attribute code authors and commentator/enhancers/hackers, who can
document what they did and why), computer language (a syntax to describe a set of
operations or activities), software patch (set of changes to code to fix bugs, add
features, etc.), digital workflow (automated sequence of programs, steps to an outcome),
software library (non-stand alone code that can be incorporated into something larger),
software application (computer code that accomplishes something)
Roles: catalog, design, develop, test, hacker, bug finder, software developer, software
engineer, developer, programmer, system administrator, execute, document, software
package maintainer, project manager, database administrator
Attribution workshop results - >500
scholarly products
121. VO Sandpit, November 2009
Modeling & implementation
VIVO-ISF: Suite of ontologies that integrates and
extends community standards
122. VO Sandpit, November 2009
Credit extends beyond the original
contribution
Stacy creates mouse1
Kristi creates mouse2
Karen uses performs RNAseq analysis on mouse1
and mouse2 to generate dataset3, which
she subsequently curates and analyzes
Karen writes publication pmid:12345 about
the results of her analysis
Karen explicitly credits Stacy as an author but not Kristi.
123. VO Sandpit, November 2009
Credit is connected
Credit to Stacy is asserted, but credit to Kristi can be
124. VO Sandpit, November 2009
Introducing openRIF
The Open Research Information Framework
openRI
F
SciENc
v
eagle-i
VIVO-
ISF
125. VO Sandpit, November 2009
Ensuring an openRIF that meets
community needs
Interoperability
A domain configurable suite of ontologies to enable interoperability
across systems
A community of developers, tools, data providers, and end-users
126. VO Sandpit, November 2009
Developing a computable research
ecosystem
Research information is scattered amongst:
Research networking tools
Citation databases (e.g., PubMED)
Award databases (e.g., NIH Reporter)
Curated archives (e.g., GenBank)
Locked up in text (the research literature)
Map SciENcv data model to VIVO-
ISF/openRIF
Enable bi-directional data exchange
Integrate SciENcv, ORCID data into
CTSAsearch
http://research.icts.uiowa.edu/polyglot/
CTSAsearch:
The Open Research Information Framework
David Eichmann
127. VO Sandpit, November 2009
Thank you!
Join the Force Attribution Working Group at:
https://www.force11.org/group/attributionwg
Join the openRIF listserv at: http://group.openrif.org
128. VO Sandpit, November 2009
Identifying those scholarly outputs
Identifiers for things that are not publications, or
documents, need to get beyond thinking about DOIs