SlideShare uma empresa Scribd logo
1 de 128
NISO Working Group Connection LIVE!
Research Data Metrics Landscape:
An update from the NISO Altmetrics Working Group B: Output Types &
Identifiers
Monday, November 16, 2015
Presenters:
Kristi Holmes, PhD, Director, Galter Health Sciences Library, Northwestern University
Mike Taylor, Senior Product Manager, Informetrics, Elsevier
Philippe Rocca-Serra, Ph.D., Technical Project Leader, Oxford
Tom Demeranville, THOR Senior Project Officer & ORCiD Software Engineer
Martin Fenner, Technical Director, DataCite
Dr. Sarah Callaghan, Senior Researcher and Project Manager, British Atmospheric Data Centre
Dr. Melissa Haendel, Associate Professor, Ontology Development Group, OHSU Library, Dept of
Medical Informatics and Clinical Epidemiology, Oregon Health & Science University
http://www.niso.org/news/events/2015/wg_connections_live/altmetrics_wgb/
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
1.
2.
3.
4.
5.
6.
7.
8.
Thank you!
Data-LevelMetrics
MartinFenner
DataCiteTechnicalDirector
http://orcid.org/0000-0003-1419-
2405
ProjectPartners
California Digital Library,PLOS,DataONE
National Science Foundation Grant 1448821
http://www.nsf.gov/awardsearch/showAward
? AWD_ID=1448821
ProjectPage
http://mdc.lagotto.io
MakingDataCount
MDCTeam
StephenAbrams
Matt Jones
Peter Slaughter
John Kratz
DaveVieglais
Projectends February 29, 2016
Jennifer Lin
John Chodacki
Patricia Cruse
Martin Fenner
Kristen Ratan
CarlyStrasser
Goals
What metrics for research data do
researchers and datamanagers want?
Dodata repositories make these
metrics available?
If not, build services to collect these
metrics for DataONErepository
network
How interested would yoube to know eachof
the following about the impact of yourdata?
http://doi.org/10.1038/sdata.2015.39
http://www.dx.doi.org/10.5060/D8H59D
What metrics/statistics doesyourrepository
currently track andexpose?
http://doi.org/10.1038/sdata.2015.39
http://www.dx.doi.org/10.5060/D8H59D
Citations
Metadata ofdatasets
https://search.labs.datacite.org/?q=10.5061%2FDRYAD.KG943
Metadata ofarticles
References are part of the
metadata deposited to CrossRef
Cited-by service aggregates these
citations for CrossRefDOIs
Work is underway to exchange DOI<->
DOI linksbetween CrossRef and DataCite
https://cls.labs.datacite.org htts://det.crossref.org
DOI<->DOIlinks are stored outside of
the DataCite and CrossRef Metadata
Stores
Fulltextsearch
http://dlm.labs.datacite.org/works/http://doi.org/10.5061/dryad.f1cb2
Secondorderevents
http://dlm.labs.datacite.org/sources/pmceurope
Downloads
https://www.dataone.org/
UsageStats
aggregate DataOne usagelog files
from DataOne member nodes
parse logs, applyingCOUNTERrules
•
•
double-click
intervals whitelist
useragents
two versions of usage
stats•
•
COUNTER-compliant
partial compliant (include some
Average%
of not
filtered
since2005COUNTER 63.57%
Partial 63.59%
this pastyearCOUNTER 44.88%
Partial 47.05%
UsageStats
FutureWork
• Collect data citations from CrossRef
• Analyze usage statistics in more detail and
provide input to COUNTERand NISO
• Analyzenetwork graph, e.g.linked datasets
and second ordercitations
• Turn researchproject into service, including
integration of client applications for search and
reporting
Introducing the
Metadata Model v1
Philippe Rocca-Serra PhD,
University of Oxford e-Research Centre
on behalf of WG3 Metadata WG
Supported by the NIH grant 1U24 AI117966-01 to the University of
A trans-NIH funding initiative established
to enable biomedical research as a digital
research enterprise
• Facilitate broad use of biomedical digital assets by making them discoverable,
accessible, and citable ->
• Conduct research and develop the methods, software, and tools needed to
analyze biomedical Big Data ->
catalog to enable researchers to find, cite research datasets
ease the use community standards to annotate datasets
Lucila Ohno-Machado (PI)
Jeff Grethe
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
Pilot applications that ‘dock’ with the prototype and
community-driven activities via Working Groups:
1. BD2K Centers of Excellence Collaboration
2. Data Identifiers Recommendation
3. Metadata Specifications
4. Use Cases and Testing Benchmarks
5. Dataset Citation Metrics
6. Criteria for Being Included in the DDI
7. Machine Actionable Licenses
8. Ranking Algorithm
9. End User Evaluation Criteria
10. Repository Collaboration
11. Outreach Meeting: Repository Operators
12. Standard-driven Curation Best Practices
13. Evaluation of Harvesting and NLP Pilot Projects
All this by
August 2017!
 Joint effort with BD2K Center for Expanded
Data Annotation and Retrieval (CEDAR)
 Synergies with BD2K cross-centers
Metadata WG (co-chaired by M
Musen/CEDAR, G Alter/bioCADDIE) and
ELIXIR activities
WG3 Metadata - Goals
 Define a set of metadata specifications that support intended
capability of the Data Discovery Index prototype - being designed by
the bioCADDIE Core Development Team - as outlined in the White
Paper
 Core metadata, designed to be future-proofed for progressive
extensions (phase 1: May-July 2015)
 Followed by test and implementation phase
 Domain specific metadata for more specialized data types (phase 2)
 Use cases and the competency questions have been used
throughout the process
 To define the appropriate boundaries and level of granularity:
which queries will be answered in full, which only partially, and
which are out of scope
Supported by the NIH grant 1U24 AI117966-01 to the University of
Supported by the NIH grant 1U24 AI117966-01 to the University of
WG3 Metadata – work to date
with contributions, comments from several WG 3 members and colleagues,
in particular: Joan Starr, George Alter, Ian Fore, Kevin Read, Stian Soyland-
Reyes, Muhammad Amith, Michel Dumontier…
By:
 Contains lists of material reviewed
• data discovery initiatives and metadata initiatives
• existing meta-models for representing metadata elements
 Outlines the approach used to identify metadata descriptors
• Via use cases and competency questions (top-down
approach)
• Mapping generic and life science-specific metadata
schemas (bottom-up approach)
 Listed in the BioSharing collection for bioCADDIE
 The results of both approaches has been compared and
converged on the core set of metadata
Supported by the NIH grant 1U24 AI117966-01 to the University of
Standard Operating Procedure (SOP)
List of Metadata Schema considered
• schema.org
• datacite
• hcls dataset descriptors
• biosample
• geo miniml
• prideml
• isatab/magetab
• ga4gh metadata schema
• sra xml
• bioproject
• cdisc sdm / element of bridge model
Supported by the NIH grant 1U24 AI117966-01 to the University of
Bottom-up approach:
survey of existing models
Supported by the NIH grant 1U24 AI117966-01 to the University of
 Selected competency questions
 representative set from use cases workshop, white paper, submitted by the
community and from Phil Bourne
 questions have been abstracted and key metadata elements have been
highlighted and color-coded and categorized
 as the set of core and extended metadata elements are defined, it will
become clearer which questions the Data Discovery Index will not be able
to answers if full and which only in part.
Supported by the NIH grant 1U24 AI117966-01 to the University of
Use Cases and Derived Metadata
 Selected competency questions
 representative set from use cases workshop, white paper, submitted by the
community and from Phil Bourne
 questions have been abstracted and key metadata elements have been
highlighted and color-coded and categorized
 as the set of core and extended metadata elements are defined, it will
become clearer which questions the Data Discovery Index will not be able
to answers if full and which only in part.
Supported by the NIH grant 1U24 AI117966-01 to the University of
Use Cases and Derived Metadata
Processing use cases
Supported by the NIH grant 1U24 AI117966-01 to the University of
All use cases on equal
footing
Term Binning
Material
Process
Information
Property
Relation identification
 Core metadata elements and initial model
 the result of the combined approaches has delivered a set of core metadata
elements and progressively these will/could be extended to domain specific
ones, in phase two, as needed
 we aim to have maximum coverage of use cases with minimal number of
data elements, but we do foresee that not all questions can be answered in
full
Initial Set of Metadata Elements
Initial Set of Metadata Elements
Everything is on github
Supported by the NIH grant 1U24 AI117966-01 to the University of
Formal specifications
metadata schema in JSON
• https://github.com/biocaddie/WG3-
MetadataSpecifications/tree/master/json-schemas
Supported by the NIH grant 1U24 AI117966-01 to the University of
What’s next ?
 With this work phase 1 has been completed
 We have entered the evaluation phase
 the model will be implemented and tested by the
bioCADDIE Development Team with a number of data
sources
 the results will inform the activities in phase 2, where
the metadata elements and the model may be
revised, simplified and/or enriched, as needed
Supported by the NIH grant 1U24 AI117966-01 to the University of
Take Home Message
• primary goal: provide a general purpose metadata
schema allow harvesting of key experimental and
data descriptors from a variety of resources and
enable indexing to support data discovery
– relations between authors, datasets, publication
and funding sources
– nature of biological signal, nature of perturbation,
Supported by the NIH grant 1U24 AI117966-01 to the University of
Outstanding issues
• prioritizing the use cases
• defining mechanisms to deal with domain specific,
granular data
• moving into phase2 and devising data ingesters
– ETL activities
– interact with other modeling efforts
• incorporate feedback from users and developers
Supported by the NIH grant 1U24 AI117966-01 to the University of
Question Time
Supported by the NIH grant 1U24 AI117966-01 to the University of
orcid.orContact Info: p. +1-301-922-9062 a. 10411 Motor City Drive, Suite 750, Bethesda, MD
20817 USA
ORCID, Metrics
andProjectTHOR
Tom Demeranville
SeniorTechnical Officier – Project
THOR
NISOWebinar, November 2015
Start Here
What isORCID?
orcid.o16 November
2015
5
5
ORCID is an infrastructure that provides unique Person
Identifiers. ORCID is a hub for linking identifiers for people
with their activities. ORCID is researcher centric with 1.7
million registered identifiers.
ORCID records are managed by the researcher themselves.
ORCID is open source, community governed and non-profit.
ORCID has a public API that allows querying of non-private
data. ORCID has a member API that enables updating and
notifications. ORCID IDs are associated with over 4 million
unique DOIs
347 members, 4 national
consortia,
over 200 integrations
researc
h
inst
68%
publishe
r
12%
funde
r
5%
9
%
associatio
n 6%
repository
ME
A
3%
orcid.o16 November
2015
5
6
Europ
e
58%
Latin
Americ
a 1%
North
Americ
a 26%
Pacifi
c
7%
Asi
a
5%
What ORCID isn’t
orcid.o16 November
2015
5
7
ORCID is not a CRIS system
ORCID is not a researcher profile system
ORCID is not a research activity metadata
store
Research outputs
orcid.o16 November
2015
5
8
• ORCID includes links to
publications, patents, datasets,
software and more.
• ORCID uses the CASRAI Output
vocabulary for work types
• ORCID references over 20 other output
identifiers (more are being added!)
Other researcher activities
orcid.o16 November
2015
5
9
• Peer
review
• Education
•
Employment
ORCID andMetrics
orcid.o16 November
2015
6
0
ORCID doesn’t track metrics – it’s not our focus
ORCID is an enabling infrastructure
ORCID improves robustness of metrics
ORCID andMetrics
orcid.o16 November
2015
6
1
• ORCID improves the quality of research
information and makes gathering it and
disseminating it easier.
• Other services use ORCID IDs to improve their
data
• ORCID IDs are found in DOI metadata, funder
systems, publishers, CRIS systems, national
reporting frameworks and more
• Institutions can discover researcher curated
standard and non-standard outputs or be
notified when added
Project THOR
http://project-thor.eu
EC funded H2020 2.5 year project
Establish seamless integration between articles, data,
and researchers across the research lifecycle
Make persistent identifier use for people and research
artefacts the default
Both human and technical in scope
http://project-thor.eu
Better identifiers == Better Metrics
http://project-thor.eu
What THOR are up to
http://project-thor.eu
Research - Deciding what needs to be done
Integration - Doing what needs to be done
Outreach - Getting others involved
Sustainability - Making sure it lasts
Organisation identifiers
http://project-thor.eu
Organisation identifiers are important for all
areas of scholarly communication, including
metrics.
The organisation identifier landscape is
fragmented. There are gaps.
It’s a hard problem. Everyone knows this.
Organisation identifiers
http://project-thor.eu
Community driven consensus on requirements is
needed.
We need a way forward.
THOR will help by convening meetings with all
interested parties in the community, including research
institutions, funders, datacentres, publishers,
standards bodies, existing organisation identifier and
other identifier providers.
Thanks
orcid.o16 November
2015
1
5
http://project-thor.eu
@tomdemeranville
t.demeranville@orcid-
eu.org
orcid.orContact Info: p. +1-301-922-9062 a. 10411 Motor City Drive, Suite 750, Bethesda, MD
20817 USA
ORCID, Metrics
andProjectTHOR
Tom Demeranville
SeniorTechnical Officier – Project
THOR
NISOWebinar, November 2015
Start Here
What isORCID?
orcid.o16 November
2015
7
0
ORCID is an infrastructure that provides unique Person
Identifiers. ORCID is a hub for linking identifiers for people
with their activities. ORCID is researcher centric with 1.7
million registered identifiers.
ORCID records are managed by the researcher themselves.
ORCID is open source, community governed and non-profit.
ORCID has a public API that allows querying of non-private
data. ORCID has a member API that enables updating and
notifications. ORCID IDs are associated with over 4 million
unique DOIs
347 members, 4 national
consortia,
over 200 integrations
researc
h
inst
68%
publishe
r
12%
funde
r
5%
9
%
associatio
n 6%
repository
ME
A
3%
orcid.o16 November
2015
7
1
Europ
e
58%
Latin
Americ
a 1%
North
Americ
a 26%
Pacifi
c
7%
Asi
a
5%
What ORCID isn’t
orcid.o16 November
2015
7
2
ORCID is not a CRIS system
ORCID is not a researcher profile system
ORCID is not a research activity metadata
store
Research outputs
orcid.o16 November
2015
7
3
• ORCID includes links to
publications, patents, datasets,
software and more.
• ORCID uses the CASRAI Output
vocabulary for work types
• ORCID references over 20 other output
identifiers (more are being added!)
Other researcher activities
orcid.o16 November
2015
7
4
• Peer
review
• Education
•
Employment
ORCID andMetrics
orcid.o16 November
2015
7
5
ORCID doesn’t track metrics – it’s not our focus
ORCID is an enabling infrastructure
ORCID improves robustness of metrics
ORCID andMetrics
orcid.o16 November
2015
7
6
• ORCID improves the quality of research
information and makes gathering it and
disseminating it easier.
• Other services use ORCID IDs to improve their
data
• ORCID IDs are found in DOI metadata, funder
systems, publishers, CRIS systems, national
reporting frameworks and more
• Institutions can discover researcher curated
standard and non-standard outputs or be
notified when added
Project THOR
http://project-thor.eu
EC funded H2020 2.5 year project
Establish seamless integration between articles, data,
and researchers across the research lifecycle
Make persistent identifier use for people and research
artefacts the default
Both human and technical in scope
http://project-thor.eu
Better identifiers == Better Metrics
http://project-thor.eu
What THOR are up to
http://project-thor.eu
Research - Deciding what needs to be done
Integration - Doing what needs to be done
Outreach - Getting others involved
Sustainability - Making sure it lasts
Organisation identifiers
http://project-thor.eu
Organisation identifiers are important for all
areas of scholarly communication, including
metrics.
The organisation identifier landscape is
fragmented. There are gaps.
It’s a hard problem. Everyone knows this.
Organisation identifiers
http://project-thor.eu
Community driven consensus on requirements is
needed.
We need a way forward.
THOR will help by convening meetings with all
interested parties in the community, including research
institutions, funders, datacentres, publishers,
standards bodies, existing organisation identifier and
other identifier providers.
Thanks
orcid.o16 November
2015
1
5
http://project-thor.eu
@tomdemeranville
t.demeranville@orcid-
eu.org
VO Sandpit, November 2009
Bibliometrics for Data – what
counts and what doesn’t?
Sarah Callaghan
sarah.callaghan@stfc.ac.uk
@sorcha_ni
NISO Working Group Connections LIVE!
Research Data Metrics Landscape:
An update from the NISO Altmetrics Working Group B: Output Types &
Identifiers
Monday, November 16 from 11:00 a.m. - 1:00 p.m. (ET)
VO Sandpit, November 2009
The UK’s Natural Environment Research Council (NERC)
funds six data centres which between them have
responsibility for the long-term management of NERC's
environmental data holdings.
We deal with a variety of environmental measurements,
along with the results of model simulations in:
•Atmospheric science
•Earth sciences
•Earth observation
•Marine Science
•Polar Science
•Terrestrial & freshwater science, Hydrology and
Bioinformatics
•Space Weather
Who are we and why do we
care about data?
VO Sandpit, November 2009
Data, Reproducibility and Science
Science should be reproducible –
other people doing the same
experiments in the same way
should get the same results.
Observational data is not
reproducible (unless you have a
time machine!)
Therefore we need to have access
to the data to confirm the science is
valid! http://www.flickr.com/photos/31333486@N00/1893012324/sizes/
o/in/photostream/
VO Sandpit, November 2009
It used to be “easy”…
Suber cells and mimosa leaves. Robert
Hooke, Micrographia, 1665
The Scientific Papers of William Parsons,
Third Earl of Rosse 1800-1867
…but datasets have gotten so big, it’s not
useful to publish them in hard copy anymore
VO Sandpit, November 2009
Hard copy of the Human Genome at
the Wellcome Collection
VO Sandpit, November 2009
Creating a dataset is hard
work!
"Piled Higher and Deeper" by Jorge Cham
www.phdcomics.com
Managing and archiving data so that it’s understandable by other
researchers is difficult and time consuming too.
We want to reward researchers for putting that effort in!
VO Sandpit, November 2009
Most people have an idea of what a
publication is
VO Sandpit, November 2009
Most people have an idea of what a
publication is
VO Sandpit, November 2009
Most people have an idea of what a
publication is
VO Sandpit, November 2009
Most people have an idea of what a
publication is
VO Sandpit, November 2009
Some examples of data (just from
the Earth Sciences)
1. Time series, some still being updated
e.g. meteorological measurements
2. Large 4D synthesised datasets, e.g.
Climate, Oceanographic, Hydrological
and Numerical Weather Prediction
model data generated on a
supercomputer
3. 2D scans e.g. satellite data, weather
radar data
4. 2D snapshots, e.g. cloud camera
5. Traces through a changing medium,
e.g. radiosonde launches, aircraft
flights, ocean salinity and temperature
6. Datasets consisting of data from
multiple instruments as part of the
same measurement campaign
7. Physical samples, e.g. fossils
VO Sandpit, November 2009
What is a Dataset?
DataCite’s definition
(http://www.datacite.org/sites/default/files/Bu
siness_Models_Principles_v1.0.pdf):
Dataset: "Recorded information, regardless of
the form or medium on which it may be
recorded including writings, films, sound
recordings, pictorial reproductions,
drawings, designs, or other graphic
representations, procedural manuals, forms,
diagrams, work flow, charts, equipment
descriptions, data files, data processing or
computer programs (software), statistical
records, and other research data."
(from the U.S. National Institutes of Health (NIH)
Grants Policy Statement via DataCite's Best
Practice Guide for Data Citation).
In my opinion a dataset is
something that is:
•The result of a defined process
•Scientifically meaningful
•Well-defined (i.e. clear
definition of what is in the
dataset and what isn’t)
VO Sandpit, November 2009
What metrics do we use for our data?
VO Sandpit, November 2009
Metric Breakdown
CEDA
numbers
Notes
Number of
discovery
dataset
records in the
DCS
Quarterly NEODC 26
BADC 242
UKSSDC 11
Compliance with NERC data management
policy. Reflects how many data sets NERC
has. The number of dataset discovery
records visible from the NERC data
discovery service.
Web site visits Quarterly BADC:
61,600
NEODC:
10,200
Active use and visibility of the data centre.
Site visits from standard web log analysis
systems, such as webaliser. Sensible web
crawler filters should have been applied.
Web site page
views
Quarterly BADC:
219,900
NEODC:
25,800
See web visits notes.
Queries
closed this
period
Quarterly 362 helpdesk
queries
838 dataset
applications
Active use and visibility of the data centre.
Queries marked as resolved within the
quarter. A query is a request for information,
a problem or ad hoc data request.
Queries
received in
period
Quarterly 388 helpdesk
queries
860 dataset
applications
Active use and visibility of the data centre.
See closed query notes.
Data
centre
metrics –
produced
15th July
2014
VO Sandpit, November 2009
Metric Breakdown CEDA numbers Notes
Percent queries
dealt with in 3
working days
Quarterly 84.06 (11.57% resolved after 3
days)
87.67 (10.23% resolved after 3
days)
Queries receiving initial response
within 1 working day
Helpdesk - 93.57 %
Dataset applications - 97.91%
Responsiveness. See
closed query notes
Identifiable users
actively
downloading
None Over year to date: BADC: 4065
NEODC: 362
Use and visibility of the
data centre. An estimate of
the number of users using
data access services over
the year.
Number of
metadata
records in data
centre web site
None BADC: 240
NEODC:33
INSPIRE compliance.
Reflects how many data
sets NERC has.
Number of
datasets
available to view
via the data
centre web site
None (Metric in development) INSPIRE compliance.
Usable services.
Number of
datasets
available to
download via the
data centre web
site
None (Metric in development) INSPIRE compliance.
Usable services.
Data
centre
metrics –
produced
15th July
2014
VO Sandpit, November 2009
Metric Breakdown CEDA numbers Notes
NERC funded Data centre
staff (FTE)
None 14 (estimate for FY
14/15)
Data management costs. Efficiency.
Number of full time equivalent posts
employed to perform data centre
functions.
Direct costs of Data
Stewardship in data centre
None (reportable at end of
financial year)
Data management costs. Efficiency. Cost
to NERC
Capital Expenditure directly
related to Data Stewardship
at data centre
None (reportable at end
financial year)
Data management costs. Efficiency.
Direct Receipts from Data
Licenses and Sales
None £0
(CEDA does not
charge for data)
Commercial value of data products and
services
Number of projects with
Outline Data Management
Plans
None (Metric in
development)
Means of tracking projects’ adoption of
good DM practice. Outline DMP is at
proposal stage
Number of projects with
Full Data Management
Plans
None (Metric in
development)
Means of tracking projects’ adoption of
good DM practice. Full DMP is at funded
stage
Users by area UK 2534 61% Active use. Visibility of the data centre
internationally. Percentage of user base
in terms of geographical spread.
Europe 494 12%
Rest of the
world
1024 25%
Unknown 79 2%
Users by institute type University 2934 71% Active use. Visibility of the data centre
sectorially. Percentage of users base in
terms of the users host institute type.
Government 694 17%
NERC 160 4%
Other 277 7%
Commercial 42 1%
School 35 1%
VO Sandpit, November 2009
Short answer:
We don’t know!!
Unless the data user comes back to us to tell us.
Or we stumble across a paper which
•Cites us
•Or mentions us in a way that we can find
• And tells us what the dataset the
authors used was.
This is why we’re working with other groups (like
CODATA, Force11, RDA, DataCite, Thompson
Reuters,…) to promote data citation.
After the data is downloaded,
what happens then?
VO Sandpit, November 2009
How we (NERC) cite
data
We using digital object identifiers (DOIs)
as part of our dataset citation
because:
• They are actionable, interoperable,
persistent links for (digital) objects
• Scientists are already used to citing
papers using DOIs (and they trust
them)
• Academic journal publishers are
starting to require datasets be cited in
a stable way, i.e. using DOIs.
• We have a good working relationship
with the British Library and DataCite
NERC’s guidance on citing data and assigning DOIs can be found at:
http://www.nerc.ac.uk/research/sites/data/doi.asp
VO Sandpit, November 2009
Dataset
catalogue page
(and DOI landing
page)
Dataset citation
Clickable link to
Dataset in the archive
VO Sandpit, November 2009
Another example
of a cited dataset
VO Sandpit, November 2009
Another example
of a cited dataset
VO Sandpit, November 2009
Data metrics – the state of the art!
Data citation isn’t common practice
(unfortunately)
Data citation counts don’t exist yet
To count how often BADC data is used
we have to:
1. Search Google Scholar for “BADC”,
“British Atmospheric Data Centre”
2. Scan the results and weed out false
positives
3. Read the papers to figure out what
datasets the authors are talking
about (if we can)
4. Count the mentions and citations (if
any)
http://www.lol-cat.org/little-lovely-lolcat-and-big-work/
We’re working with DataCite and
Thompson Reuters to get data
citation counts.
VO Sandpit, November 2009
Altmetrics and social media for data?
Mainly focussing on citation as a first
step, as it’s most commonly
accepted by researchers.
We have a social media presence
@CEDAnews
- Mainly used for announcements about
service availability
We definitely want ways of showing our
funders that we provide a good
service to our users and the research
community.
And we want to be able to
tell our depositors what
impact their data has had!
VO Sandpit, November 2009
RDA/WDS WG Bibliometrics Survey
Results: Mostly Expected
Citations are preferred metrics,
downloads next.
Standards are missing.
Culture change is needed.
0 10 20 30 40 50 60 70
Nothing
Data citation counts
Downloads
Social media (likes/shares/tweets)
Mentions in peer-reviewed papers
Hits in search engines
Mentions in blogs
Bookmarks in Zotero and/or Mendeley
Other (please specify)
31.5%
68.5%
Are the methods you use to evaluate impact
adequate for your needs?
Yes
No
What do you currently use
to evaluate the impact of
data?
VO Sandpit, November 2009
Other projects in the data metrics
space
1. CASRAI data level metrics
2. PLOS Making Data Count
3. NISO altmetrics
4. Jisc Giving Researchers Credit for their Data
VO Sandpit, November 2009
Next steps for Bibliometrics for
Data WG
Will be based on:
• WG survey results (presented RDA P4 and P5)
• Spreadsheet of metrics being collected by repositories - Still open
for contributions! http://bit.ly/1MpyW4K
• Shared results from other projects – understanding the challenges
and answering the questions posed in the case statement
• Preliminary analysis of data DOI resolutions
• Supporting and evaluating tools from other projects
• Preliminary guidance for the community - “minimal” rather than
“best” practice – get people discussing the issues and coming up
with solutions!
VO Sandpit, November 2009
Thanks!
Any questions?
sarah.callaghan@stfc.ac.uk
@sorcha_ni
http://citingbytes.blogspot.co.uk/
Image credit: Borepatch http://borepatch.blogspot.com/2010/06/its-
not-what-you-dont-know-that-hurts.html
“Publishing research without data is simply
advertising, not science” - Graham Steel
http://blog.okfn.org/2013/09/03/publishing-research-without-data-is-simply-advertising-not-science/
VO Sandpit, November 2009
Title: Getting (and giving) credit for all that we do
Melissa Haendel
NISO Research Data Metrics Landscape:
An update from the NISO Altmetrics Working Group B:
Output Types & Identifiers
11.16.2015
@ontowonka
VO Sandpit, November 2009
What *IS* “success”?
VO Sandpit, November 2009
https://goo.gl/b60moX
It’s not always what you see
VO Sandpit, November 2009
What is attribution???
VO Sandpit, November 2009
VO Sandpit, November 2009
Over 1000 authors
VO Sandpit, November 2009
Project CRediT
http://projectcredit.net
VO Sandpit, November 2009
Many contributions don’t lead to authorship
BD2K co-authorship
D.Eichmann
N.Vasilevsky
20% key personnel are not adequately profiled using publications
VO Sandpit, November 2009
Some contributions are anonymous
Data deposition
Image credit: http://disruptiveviews.com/is-your-data-anonymous-or-just-encrypted/
Anonymous review
VO Sandpit, November 2009
The Research Life Cycle
EXPERIMENT
CONSULT
PUBLISHDATA
FUND
VO Sandpit, November 2009
The Research Life Cycle
EXPERIMENT
CONSULT
PUBLISHDATA
FUND
VO Sandpit, November 2009
• Measurement instruments
• Continuing education materials
• Cost-effective intervention
• Change in delivery of healthcare services
• Quality measure guidelines
• Gray literature
Evidence of meaningful impact
• New experimental methods, data
models, databases, software tools
• New diagnostic criteria
• New standards of care
• Biological materials, animal models
• Consent documents
• Clinical/practice guidelines
https://becker.wustl.edu/impact-assessment
http://nucats.northwestern.edu/
Diverse outputs
Diverse impacts
Diverse roles
Each a critical component of the
research process
VO Sandpit, November 2009
EXAMPLE OUTPUTS related to software:
Outputs: binary redistribution package (installer), algorithm, data analytic software tool,
analysis scripts, data cleaning, APIs, codebook (for content analysis), source code,
software to make metadata for libraries archives and museums, data analytic software
tool, source code, program codes (for modeling), commentary in code(thinking of open
source-need to attribute code authors and commentator/enhancers/hackers, who can
document what they did and why), computer language (a syntax to describe a set of
operations or activities), software patch (set of changes to code to fix bugs, add
features, etc.), digital workflow (automated sequence of programs, steps to an outcome),
software library (non-stand alone code that can be incorporated into something larger),
software application (computer code that accomplishes something)
Roles: catalog, design, develop, test, hacker, bug finder, software developer, software
engineer, developer, programmer, system administrator, execute, document, software
package maintainer, project manager, database administrator
Attribution workshop results - >500
scholarly products
VO Sandpit, November 2009
Connecting people to their “stuff”
VO Sandpit, November 2009
Modeling & implementation
VIVO-ISF: Suite of ontologies that integrates and
extends community standards
VO Sandpit, November 2009
Credit extends beyond the original
contribution
 Stacy creates mouse1
 Kristi creates mouse2
 Karen uses performs RNAseq analysis on mouse1
and mouse2 to generate dataset3, which
she subsequently curates and analyzes
 Karen writes publication pmid:12345 about
the results of her analysis
 Karen explicitly credits Stacy as an author but not Kristi.
VO Sandpit, November 2009
Credit is connected
Credit to Stacy is asserted, but credit to Kristi can be
VO Sandpit, November 2009
Introducing openRIF
The Open Research Information Framework
openRI
F
SciENc
v
eagle-i
VIVO-
ISF
VO Sandpit, November 2009
Ensuring an openRIF that meets
community needs
Interoperability
A domain configurable suite of ontologies to enable interoperability
across systems
A community of developers, tools, data providers, and end-users
VO Sandpit, November 2009
Developing a computable research
ecosystem
Research information is scattered amongst:
Research networking tools
Citation databases (e.g., PubMED)
Award databases (e.g., NIH Reporter)
Curated archives (e.g., GenBank)
Locked up in text (the research literature)
Map SciENcv data model to VIVO-
ISF/openRIF
Enable bi-directional data exchange
Integrate SciENcv, ORCID data into
CTSAsearch
http://research.icts.uiowa.edu/polyglot/
CTSAsearch:
The Open Research Information Framework
David Eichmann
VO Sandpit, November 2009
Thank you!
Join the Force Attribution Working Group at:
https://www.force11.org/group/attributionwg
Join the openRIF listserv at: http://group.openrif.org
VO Sandpit, November 2009
Identifying those scholarly outputs
Identifiers for things that are not publications, or
documents, need to get beyond thinking about DOIs

Mais conteúdo relacionado

Mais procurados

Why does research data matter to libraries
Why does research data matter to librariesWhy does research data matter to libraries
Why does research data matter to librariesJisc RDM
 
Manage your online profile: Maximize the visibility of your work and make an ...
Manage your online profile: Maximize the visibility of your work and make an ...Manage your online profile: Maximize the visibility of your work and make an ...
Manage your online profile: Maximize the visibility of your work and make an ...Julia Gelfand
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Jisc
 
The Embedded Data Librarian
The Embedded Data LibrarianThe Embedded Data Librarian
The Embedded Data LibrarianLibrary_Connect
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016Susanna-Assunta Sansone
 
Educause 2015 RDM Maturity
Educause 2015 RDM Maturity Educause 2015 RDM Maturity
Educause 2015 RDM Maturity ResearchSpace
 
Research information management: making sense of it all
Research information management: making sense of it allResearch information management: making sense of it all
Research information management: making sense of it allDigital Science
 
Data Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models by Sünje Dallmeier-TiessenData Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models by Sünje Dallmeier-Tiessendatascienceiqss
 

Mais procurados (20)

Why does research data matter to libraries
Why does research data matter to librariesWhy does research data matter to libraries
Why does research data matter to libraries
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Manage your online profile: Maximize the visibility of your work and make an ...
Manage your online profile: Maximize the visibility of your work and make an ...Manage your online profile: Maximize the visibility of your work and make an ...
Manage your online profile: Maximize the visibility of your work and make an ...
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
NISO Two Part Webinar: Is Granularity the Next Discovery Frontier? Part 1: ...
NISO Two Part Webinar:   Is Granularity the Next Discovery Frontier? Part 1: ...NISO Two Part Webinar:   Is Granularity the Next Discovery Frontier? Part 1: ...
NISO Two Part Webinar: Is Granularity the Next Discovery Frontier? Part 1: ...
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014
 
Valen Metadata and the [Data] Repository
Valen Metadata and the [Data] RepositoryValen Metadata and the [Data] Repository
Valen Metadata and the [Data] Repository
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Johnston - How to Curate Research Data
Johnston - How to Curate Research DataJohnston - How to Curate Research Data
Johnston - How to Curate Research Data
 
Data Metadata and Data Citation - Emma Ganley (PLoS)
Data Metadata and Data Citation - Emma Ganley (PLoS)Data Metadata and Data Citation - Emma Ganley (PLoS)
Data Metadata and Data Citation - Emma Ganley (PLoS)
 
The Embedded Data Librarian
The Embedded Data LibrarianThe Embedded Data Librarian
The Embedded Data Librarian
 
Library Support of Identification and Discovery of Scholarly Output - Cross- ...
Library Support of Identification and Discovery of Scholarly Output - Cross- ...Library Support of Identification and Discovery of Scholarly Output - Cross- ...
Library Support of Identification and Discovery of Scholarly Output - Cross- ...
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Attribution From Res Lib Perspective - Micah Altman, MIT
Attribution From Res Lib Perspective - Micah Altman, MITAttribution From Res Lib Perspective - Micah Altman, MIT
Attribution From Res Lib Perspective - Micah Altman, MIT
 
McCulloch NISO-ICSTI Joint Webinar
McCulloch NISO-ICSTI Joint WebinarMcCulloch NISO-ICSTI Joint Webinar
McCulloch NISO-ICSTI Joint Webinar
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
 
Educause 2015 RDM Maturity
Educause 2015 RDM Maturity Educause 2015 RDM Maturity
Educause 2015 RDM Maturity
 
Research information management: making sense of it all
Research information management: making sense of it allResearch information management: making sense of it all
Research information management: making sense of it all
 
Data Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models by Sünje Dallmeier-TiessenData Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models by Sünje Dallmeier-Tiessen
 
Mejias "Making it work globally"
Mejias "Making it work globally"Mejias "Making it work globally"
Mejias "Making it work globally"
 

Destaque

Destaque (20)

Ayubi Understanding the Marketplace, Part Two
Ayubi Understanding the Marketplace, Part TwoAyubi Understanding the Marketplace, Part Two
Ayubi Understanding the Marketplace, Part Two
 
Swenson Understanding the Marketplace Part Two
Swenson Understanding the Marketplace Part TwoSwenson Understanding the Marketplace Part Two
Swenson Understanding the Marketplace Part Two
 
Lawless-3-jun15
Lawless-3-jun15Lawless-3-jun15
Lawless-3-jun15
 
McDanold-1-jun15
McDanold-1-jun15McDanold-1-jun15
McDanold-1-jun15
 
Thompson 6-jun15-final
Thompson 6-jun15-finalThompson 6-jun15-final
Thompson 6-jun15-final
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Lauruhn-5-jun15
 
Hansen-2-jun15
Hansen-2-jun15Hansen-2-jun15
Hansen-2-jun15
 
Shreeves Lessons Learned and Looking Forward
Shreeves Lessons Learned and Looking ForwardShreeves Lessons Learned and Looking Forward
Shreeves Lessons Learned and Looking Forward
 
Conversation with Clifford Lynch, Executive Director, CNI
Conversation with Clifford Lynch, Executive Director, CNIConversation with Clifford Lynch, Executive Director, CNI
Conversation with Clifford Lynch, Executive Director, CNI
 
Wilcox - Open Source Repositories and the Future of Fedora
Wilcox - Open Source Repositories and the Future of FedoraWilcox - Open Source Repositories and the Future of Fedora
Wilcox - Open Source Repositories and the Future of Fedora
 
Byrne - Repository Integrations
Byrne - Repository IntegrationsByrne - Repository Integrations
Byrne - Repository Integrations
 
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content TypesIlik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
 
Caldrone - Specific Needs and Concerns Associated with Data Repositories
Caldrone - Specific Needs and Concerns Associated with Data RepositoriesCaldrone - Specific Needs and Concerns Associated with Data Repositories
Caldrone - Specific Needs and Concerns Associated with Data Repositories
 
Wacker-4-june15
Wacker-4-june15Wacker-4-june15
Wacker-4-june15
 
Stahmer-9-Jun15-final
Stahmer-9-Jun15-finalStahmer-9-Jun15-final
Stahmer-9-Jun15-final
 
Gonzalez-8-jun15
Gonzalez-8-jun15Gonzalez-8-jun15
Gonzalez-8-jun15
 
Wiggins-7-jun15
Wiggins-7-jun15Wiggins-7-jun15
Wiggins-7-jun15
 
Digby - Institutional Repository - Vendor Partnerships
Digby - Institutional Repository - Vendor PartnershipsDigby - Institutional Repository - Vendor Partnerships
Digby - Institutional Repository - Vendor Partnerships
 
Stohn - Promoting Discovery of Institutional Repository Content
Stohn - Promoting Discovery of Institutional Repository ContentStohn - Promoting Discovery of Institutional Repository Content
Stohn - Promoting Discovery of Institutional Repository Content
 
DeVries Feb 8 Getting Authentication Right
DeVries Feb 8 Getting Authentication RightDeVries Feb 8 Getting Authentication Right
DeVries Feb 8 Getting Authentication Right
 

Semelhante a NISO Working Group Connection LIVE! Research Data Metrics Landscape

PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhilip Bourne
 
NSF Data Requirements and Changing Federal Requirements for Research
NSF Data Requirements and Changing Federal Requirements for ResearchNSF Data Requirements and Changing Federal Requirements for Research
NSF Data Requirements and Changing Federal Requirements for ResearchMargaret Henderson
 
Stakeholder Outreach and Engagement - Encouraging Use of New Scientific Data
Stakeholder Outreach and Engagement - Encouraging Use of New Scientific DataStakeholder Outreach and Engagement - Encouraging Use of New Scientific Data
Stakeholder Outreach and Engagement - Encouraging Use of New Scientific DataMonica Linnenbrink
 
Building and providing data management services a framework for everyone!
Building and providing data management services  a framework for everyone!Building and providing data management services  a framework for everyone!
Building and providing data management services a framework for everyone!Renaine Julian
 
Agile Curation: 2015 AGU Presentation
Agile Curation: 2015 AGU PresentationAgile Curation: 2015 AGU Presentation
Agile Curation: 2015 AGU PresentationJosh Young
 
Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Application of Secondary Data in Epidemiological Study, Design Protocol and S...Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Application of Secondary Data in Epidemiological Study, Design Protocol and S...Mohammad Aslam Shaiekh
 
Funder requirements for Data Management Plans
Funder requirements for Data Management PlansFunder requirements for Data Management Plans
Funder requirements for Data Management PlansSherry Lake
 
NIH Big Data to Knowledge (BD2K)
NIH Big Data to Knowledge (BD2K)NIH Big Data to Knowledge (BD2K)
NIH Big Data to Knowledge (BD2K)Lance K. Manning
 
Practical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscapePractical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscapeDigital Science
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsPhilip Bourne
 
Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemGlobus
 
Gather evidence to demonstrate the impact of your research
Gather evidence to demonstrate the impact of your researchGather evidence to demonstrate the impact of your research
Gather evidence to demonstrate the impact of your researchIUPUI
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in librariesC. Tobin Magle
 
Managing and Sharing Research Data - Workshop at UiO - December 04, 2017
Managing and Sharing Research Data - Workshop at UiO - December 04, 2017Managing and Sharing Research Data - Workshop at UiO - December 04, 2017
Managing and Sharing Research Data - Workshop at UiO - December 04, 2017Michel Heeremans
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015William Gunn
 
From Data Policy Towards FAIR Data For All: How standardised data policies ca...
From Data Policy Towards FAIR Data For All: How standardised data policies ca...From Data Policy Towards FAIR Data For All: How standardised data policies ca...
From Data Policy Towards FAIR Data For All: How standardised data policies ca...Rebecca Grant
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingBram Zandbelt
 

Semelhante a NISO Working Group Connection LIVE! Research Data Metrics Landscape (20)

PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Open Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality DataOpen Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality Data
 
NSF Data Requirements and Changing Federal Requirements for Research
NSF Data Requirements and Changing Federal Requirements for ResearchNSF Data Requirements and Changing Federal Requirements for Research
NSF Data Requirements and Changing Federal Requirements for Research
 
Stakeholder Outreach and Engagement - Encouraging Use of New Scientific Data
Stakeholder Outreach and Engagement - Encouraging Use of New Scientific DataStakeholder Outreach and Engagement - Encouraging Use of New Scientific Data
Stakeholder Outreach and Engagement - Encouraging Use of New Scientific Data
 
Building and providing data management services a framework for everyone!
Building and providing data management services  a framework for everyone!Building and providing data management services  a framework for everyone!
Building and providing data management services a framework for everyone!
 
Agile Curation: 2015 AGU Presentation
Agile Curation: 2015 AGU PresentationAgile Curation: 2015 AGU Presentation
Agile Curation: 2015 AGU Presentation
 
Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Application of Secondary Data in Epidemiological Study, Design Protocol and S...Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Application of Secondary Data in Epidemiological Study, Design Protocol and S...
 
Funder requirements for Data Management Plans
Funder requirements for Data Management PlansFunder requirements for Data Management Plans
Funder requirements for Data Management Plans
 
NIH Big Data to Knowledge (BD2K)
NIH Big Data to Knowledge (BD2K)NIH Big Data to Knowledge (BD2K)
NIH Big Data to Knowledge (BD2K)
 
Practical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscapePractical applications for altmetrics in a changing metrics landscape
Practical applications for altmetrics in a changing metrics landscape
 
Shifting the goal post – from high impact journals to high impact data
 Shifting the goal post – from high impact journals to high impact data Shifting the goal post – from high impact journals to high impact data
Shifting the goal post – from high impact journals to high impact data
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early Thoughts
 
Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data Ecosystem
 
Gather evidence to demonstrate the impact of your research
Gather evidence to demonstrate the impact of your researchGather evidence to demonstrate the impact of your research
Gather evidence to demonstrate the impact of your research
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 
Managing and Sharing Research Data - Workshop at UiO - December 04, 2017
Managing and Sharing Research Data - Workshop at UiO - December 04, 2017Managing and Sharing Research Data - Workshop at UiO - December 04, 2017
Managing and Sharing Research Data - Workshop at UiO - December 04, 2017
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
From Data Policy Towards FAIR Data For All: How standardised data policies ca...
From Data Policy Towards FAIR Data For All: How standardised data policies ca...From Data Policy Towards FAIR Data For All: How standardised data policies ca...
From Data Policy Towards FAIR Data For All: How standardised data policies ca...
 
Journal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific ComputingJournal Club - Best Practices for Scientific Computing
Journal Club - Best Practices for Scientific Computing
 

Mais de National Information Standards Organization (NISO)

Mais de National Information Standards Organization (NISO) (20)

Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 
Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"
 
Ratner "Enhancing Open Science: Assessing Tools & Charting Progress"
Ratner "Enhancing Open Science: Assessing Tools & Charting Progress"Ratner "Enhancing Open Science: Assessing Tools & Charting Progress"
Ratner "Enhancing Open Science: Assessing Tools & Charting Progress"
 
Pfeiffer "Enhancing Open Science: Assessing Tools & Charting Progress"
Pfeiffer "Enhancing Open Science: Assessing Tools & Charting Progress"Pfeiffer "Enhancing Open Science: Assessing Tools & Charting Progress"
Pfeiffer "Enhancing Open Science: Assessing Tools & Charting Progress"
 

Último

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 

Último (20)

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 

NISO Working Group Connection LIVE! Research Data Metrics Landscape

  • 1. NISO Working Group Connection LIVE! Research Data Metrics Landscape: An update from the NISO Altmetrics Working Group B: Output Types & Identifiers Monday, November 16, 2015 Presenters: Kristi Holmes, PhD, Director, Galter Health Sciences Library, Northwestern University Mike Taylor, Senior Product Manager, Informetrics, Elsevier Philippe Rocca-Serra, Ph.D., Technical Project Leader, Oxford Tom Demeranville, THOR Senior Project Officer & ORCiD Software Engineer Martin Fenner, Technical Director, DataCite Dr. Sarah Callaghan, Senior Researcher and Project Manager, British Atmospheric Data Centre Dr. Melissa Haendel, Associate Professor, Ontology Development Group, OHSU Library, Dept of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University http://www.niso.org/news/events/2015/wg_connections_live/altmetrics_wgb/
  • 2.
  • 3.
  • 11. ProjectPartners California Digital Library,PLOS,DataONE National Science Foundation Grant 1448821 http://www.nsf.gov/awardsearch/showAward ? AWD_ID=1448821 ProjectPage http://mdc.lagotto.io MakingDataCount
  • 12. MDCTeam StephenAbrams Matt Jones Peter Slaughter John Kratz DaveVieglais Projectends February 29, 2016 Jennifer Lin John Chodacki Patricia Cruse Martin Fenner Kristen Ratan CarlyStrasser
  • 13. Goals What metrics for research data do researchers and datamanagers want? Dodata repositories make these metrics available? If not, build services to collect these metrics for DataONErepository network
  • 14. How interested would yoube to know eachof the following about the impact of yourdata? http://doi.org/10.1038/sdata.2015.39 http://www.dx.doi.org/10.5060/D8H59D
  • 15. What metrics/statistics doesyourrepository currently track andexpose? http://doi.org/10.1038/sdata.2015.39 http://www.dx.doi.org/10.5060/D8H59D
  • 18. Metadata ofarticles References are part of the metadata deposited to CrossRef Cited-by service aggregates these citations for CrossRefDOIs Work is underway to exchange DOI<-> DOI linksbetween CrossRef and DataCite
  • 19. https://cls.labs.datacite.org htts://det.crossref.org DOI<->DOIlinks are stored outside of the DataCite and CrossRef Metadata Stores
  • 24. UsageStats aggregate DataOne usagelog files from DataOne member nodes parse logs, applyingCOUNTERrules • • double-click intervals whitelist useragents two versions of usage stats• • COUNTER-compliant partial compliant (include some
  • 25. Average% of not filtered since2005COUNTER 63.57% Partial 63.59% this pastyearCOUNTER 44.88% Partial 47.05% UsageStats
  • 26. FutureWork • Collect data citations from CrossRef • Analyze usage statistics in more detail and provide input to COUNTERand NISO • Analyzenetwork graph, e.g.linked datasets and second ordercitations • Turn researchproject into service, including integration of client applications for search and reporting
  • 27. Introducing the Metadata Model v1 Philippe Rocca-Serra PhD, University of Oxford e-Research Centre on behalf of WG3 Metadata WG Supported by the NIH grant 1U24 AI117966-01 to the University of
  • 28. A trans-NIH funding initiative established to enable biomedical research as a digital research enterprise • Facilitate broad use of biomedical digital assets by making them discoverable, accessible, and citable -> • Conduct research and develop the methods, software, and tools needed to analyze biomedical Big Data -> catalog to enable researchers to find, cite research datasets ease the use community standards to annotate datasets
  • 29. Lucila Ohno-Machado (PI) Jeff Grethe Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
  • 30. Supported by the NIH grant 1U24 AI117966-01 to the University of California, San Diego
  • 31. Pilot applications that ‘dock’ with the prototype and community-driven activities via Working Groups: 1. BD2K Centers of Excellence Collaboration 2. Data Identifiers Recommendation 3. Metadata Specifications 4. Use Cases and Testing Benchmarks 5. Dataset Citation Metrics 6. Criteria for Being Included in the DDI 7. Machine Actionable Licenses 8. Ranking Algorithm 9. End User Evaluation Criteria 10. Repository Collaboration 11. Outreach Meeting: Repository Operators 12. Standard-driven Curation Best Practices 13. Evaluation of Harvesting and NLP Pilot Projects All this by August 2017!
  • 32.  Joint effort with BD2K Center for Expanded Data Annotation and Retrieval (CEDAR)  Synergies with BD2K cross-centers Metadata WG (co-chaired by M Musen/CEDAR, G Alter/bioCADDIE) and ELIXIR activities
  • 33. WG3 Metadata - Goals  Define a set of metadata specifications that support intended capability of the Data Discovery Index prototype - being designed by the bioCADDIE Core Development Team - as outlined in the White Paper  Core metadata, designed to be future-proofed for progressive extensions (phase 1: May-July 2015)  Followed by test and implementation phase  Domain specific metadata for more specialized data types (phase 2)  Use cases and the competency questions have been used throughout the process  To define the appropriate boundaries and level of granularity: which queries will be answered in full, which only partially, and which are out of scope Supported by the NIH grant 1U24 AI117966-01 to the University of
  • 34. Supported by the NIH grant 1U24 AI117966-01 to the University of WG3 Metadata – work to date with contributions, comments from several WG 3 members and colleagues, in particular: Joan Starr, George Alter, Ian Fore, Kevin Read, Stian Soyland- Reyes, Muhammad Amith, Michel Dumontier… By:
  • 35.  Contains lists of material reviewed • data discovery initiatives and metadata initiatives • existing meta-models for representing metadata elements  Outlines the approach used to identify metadata descriptors • Via use cases and competency questions (top-down approach) • Mapping generic and life science-specific metadata schemas (bottom-up approach)  Listed in the BioSharing collection for bioCADDIE  The results of both approaches has been compared and converged on the core set of metadata Supported by the NIH grant 1U24 AI117966-01 to the University of Standard Operating Procedure (SOP)
  • 36.
  • 37. List of Metadata Schema considered • schema.org • datacite • hcls dataset descriptors • biosample • geo miniml • prideml • isatab/magetab • ga4gh metadata schema • sra xml • bioproject • cdisc sdm / element of bridge model Supported by the NIH grant 1U24 AI117966-01 to the University of
  • 38. Bottom-up approach: survey of existing models Supported by the NIH grant 1U24 AI117966-01 to the University of
  • 39.  Selected competency questions  representative set from use cases workshop, white paper, submitted by the community and from Phil Bourne  questions have been abstracted and key metadata elements have been highlighted and color-coded and categorized  as the set of core and extended metadata elements are defined, it will become clearer which questions the Data Discovery Index will not be able to answers if full and which only in part. Supported by the NIH grant 1U24 AI117966-01 to the University of Use Cases and Derived Metadata
  • 40.  Selected competency questions  representative set from use cases workshop, white paper, submitted by the community and from Phil Bourne  questions have been abstracted and key metadata elements have been highlighted and color-coded and categorized  as the set of core and extended metadata elements are defined, it will become clearer which questions the Data Discovery Index will not be able to answers if full and which only in part. Supported by the NIH grant 1U24 AI117966-01 to the University of Use Cases and Derived Metadata
  • 41. Processing use cases Supported by the NIH grant 1U24 AI117966-01 to the University of All use cases on equal footing Term Binning Material Process Information Property Relation identification
  • 42.  Core metadata elements and initial model  the result of the combined approaches has delivered a set of core metadata elements and progressively these will/could be extended to domain specific ones, in phase two, as needed  we aim to have maximum coverage of use cases with minimal number of data elements, but we do foresee that not all questions can be answered in full Initial Set of Metadata Elements
  • 43. Initial Set of Metadata Elements
  • 44. Everything is on github Supported by the NIH grant 1U24 AI117966-01 to the University of
  • 45. Formal specifications metadata schema in JSON • https://github.com/biocaddie/WG3- MetadataSpecifications/tree/master/json-schemas Supported by the NIH grant 1U24 AI117966-01 to the University of
  • 46. What’s next ?  With this work phase 1 has been completed  We have entered the evaluation phase  the model will be implemented and tested by the bioCADDIE Development Team with a number of data sources  the results will inform the activities in phase 2, where the metadata elements and the model may be revised, simplified and/or enriched, as needed Supported by the NIH grant 1U24 AI117966-01 to the University of
  • 47. Take Home Message • primary goal: provide a general purpose metadata schema allow harvesting of key experimental and data descriptors from a variety of resources and enable indexing to support data discovery – relations between authors, datasets, publication and funding sources – nature of biological signal, nature of perturbation, Supported by the NIH grant 1U24 AI117966-01 to the University of
  • 48. Outstanding issues • prioritizing the use cases • defining mechanisms to deal with domain specific, granular data • moving into phase2 and devising data ingesters – ETL activities – interact with other modeling efforts • incorporate feedback from users and developers Supported by the NIH grant 1U24 AI117966-01 to the University of
  • 49. Question Time Supported by the NIH grant 1U24 AI117966-01 to the University of
  • 50. orcid.orContact Info: p. +1-301-922-9062 a. 10411 Motor City Drive, Suite 750, Bethesda, MD 20817 USA ORCID, Metrics andProjectTHOR Tom Demeranville SeniorTechnical Officier – Project THOR NISOWebinar, November 2015 Start Here
  • 51. What isORCID? orcid.o16 November 2015 5 5 ORCID is an infrastructure that provides unique Person Identifiers. ORCID is a hub for linking identifiers for people with their activities. ORCID is researcher centric with 1.7 million registered identifiers. ORCID records are managed by the researcher themselves. ORCID is open source, community governed and non-profit. ORCID has a public API that allows querying of non-private data. ORCID has a member API that enables updating and notifications. ORCID IDs are associated with over 4 million unique DOIs
  • 52. 347 members, 4 national consortia, over 200 integrations researc h inst 68% publishe r 12% funde r 5% 9 % associatio n 6% repository ME A 3% orcid.o16 November 2015 5 6 Europ e 58% Latin Americ a 1% North Americ a 26% Pacifi c 7% Asi a 5%
  • 53. What ORCID isn’t orcid.o16 November 2015 5 7 ORCID is not a CRIS system ORCID is not a researcher profile system ORCID is not a research activity metadata store
  • 54. Research outputs orcid.o16 November 2015 5 8 • ORCID includes links to publications, patents, datasets, software and more. • ORCID uses the CASRAI Output vocabulary for work types • ORCID references over 20 other output identifiers (more are being added!)
  • 55. Other researcher activities orcid.o16 November 2015 5 9 • Peer review • Education • Employment
  • 56. ORCID andMetrics orcid.o16 November 2015 6 0 ORCID doesn’t track metrics – it’s not our focus ORCID is an enabling infrastructure ORCID improves robustness of metrics
  • 57. ORCID andMetrics orcid.o16 November 2015 6 1 • ORCID improves the quality of research information and makes gathering it and disseminating it easier. • Other services use ORCID IDs to improve their data • ORCID IDs are found in DOI metadata, funder systems, publishers, CRIS systems, national reporting frameworks and more • Institutions can discover researcher curated standard and non-standard outputs or be notified when added
  • 58. Project THOR http://project-thor.eu EC funded H2020 2.5 year project Establish seamless integration between articles, data, and researchers across the research lifecycle Make persistent identifier use for people and research artefacts the default Both human and technical in scope
  • 60. Better identifiers == Better Metrics http://project-thor.eu
  • 61. What THOR are up to http://project-thor.eu Research - Deciding what needs to be done Integration - Doing what needs to be done Outreach - Getting others involved Sustainability - Making sure it lasts
  • 62. Organisation identifiers http://project-thor.eu Organisation identifiers are important for all areas of scholarly communication, including metrics. The organisation identifier landscape is fragmented. There are gaps. It’s a hard problem. Everyone knows this.
  • 63. Organisation identifiers http://project-thor.eu Community driven consensus on requirements is needed. We need a way forward. THOR will help by convening meetings with all interested parties in the community, including research institutions, funders, datacentres, publishers, standards bodies, existing organisation identifier and other identifier providers.
  • 65. orcid.orContact Info: p. +1-301-922-9062 a. 10411 Motor City Drive, Suite 750, Bethesda, MD 20817 USA ORCID, Metrics andProjectTHOR Tom Demeranville SeniorTechnical Officier – Project THOR NISOWebinar, November 2015 Start Here
  • 66. What isORCID? orcid.o16 November 2015 7 0 ORCID is an infrastructure that provides unique Person Identifiers. ORCID is a hub for linking identifiers for people with their activities. ORCID is researcher centric with 1.7 million registered identifiers. ORCID records are managed by the researcher themselves. ORCID is open source, community governed and non-profit. ORCID has a public API that allows querying of non-private data. ORCID has a member API that enables updating and notifications. ORCID IDs are associated with over 4 million unique DOIs
  • 67. 347 members, 4 national consortia, over 200 integrations researc h inst 68% publishe r 12% funde r 5% 9 % associatio n 6% repository ME A 3% orcid.o16 November 2015 7 1 Europ e 58% Latin Americ a 1% North Americ a 26% Pacifi c 7% Asi a 5%
  • 68. What ORCID isn’t orcid.o16 November 2015 7 2 ORCID is not a CRIS system ORCID is not a researcher profile system ORCID is not a research activity metadata store
  • 69. Research outputs orcid.o16 November 2015 7 3 • ORCID includes links to publications, patents, datasets, software and more. • ORCID uses the CASRAI Output vocabulary for work types • ORCID references over 20 other output identifiers (more are being added!)
  • 70. Other researcher activities orcid.o16 November 2015 7 4 • Peer review • Education • Employment
  • 71. ORCID andMetrics orcid.o16 November 2015 7 5 ORCID doesn’t track metrics – it’s not our focus ORCID is an enabling infrastructure ORCID improves robustness of metrics
  • 72. ORCID andMetrics orcid.o16 November 2015 7 6 • ORCID improves the quality of research information and makes gathering it and disseminating it easier. • Other services use ORCID IDs to improve their data • ORCID IDs are found in DOI metadata, funder systems, publishers, CRIS systems, national reporting frameworks and more • Institutions can discover researcher curated standard and non-standard outputs or be notified when added
  • 73. Project THOR http://project-thor.eu EC funded H2020 2.5 year project Establish seamless integration between articles, data, and researchers across the research lifecycle Make persistent identifier use for people and research artefacts the default Both human and technical in scope
  • 75. Better identifiers == Better Metrics http://project-thor.eu
  • 76. What THOR are up to http://project-thor.eu Research - Deciding what needs to be done Integration - Doing what needs to be done Outreach - Getting others involved Sustainability - Making sure it lasts
  • 77. Organisation identifiers http://project-thor.eu Organisation identifiers are important for all areas of scholarly communication, including metrics. The organisation identifier landscape is fragmented. There are gaps. It’s a hard problem. Everyone knows this.
  • 78. Organisation identifiers http://project-thor.eu Community driven consensus on requirements is needed. We need a way forward. THOR will help by convening meetings with all interested parties in the community, including research institutions, funders, datacentres, publishers, standards bodies, existing organisation identifier and other identifier providers.
  • 80. VO Sandpit, November 2009 Bibliometrics for Data – what counts and what doesn’t? Sarah Callaghan sarah.callaghan@stfc.ac.uk @sorcha_ni NISO Working Group Connections LIVE! Research Data Metrics Landscape: An update from the NISO Altmetrics Working Group B: Output Types & Identifiers Monday, November 16 from 11:00 a.m. - 1:00 p.m. (ET)
  • 81. VO Sandpit, November 2009 The UK’s Natural Environment Research Council (NERC) funds six data centres which between them have responsibility for the long-term management of NERC's environmental data holdings. We deal with a variety of environmental measurements, along with the results of model simulations in: •Atmospheric science •Earth sciences •Earth observation •Marine Science •Polar Science •Terrestrial & freshwater science, Hydrology and Bioinformatics •Space Weather Who are we and why do we care about data?
  • 82. VO Sandpit, November 2009 Data, Reproducibility and Science Science should be reproducible – other people doing the same experiments in the same way should get the same results. Observational data is not reproducible (unless you have a time machine!) Therefore we need to have access to the data to confirm the science is valid! http://www.flickr.com/photos/31333486@N00/1893012324/sizes/ o/in/photostream/
  • 83. VO Sandpit, November 2009 It used to be “easy”… Suber cells and mimosa leaves. Robert Hooke, Micrographia, 1665 The Scientific Papers of William Parsons, Third Earl of Rosse 1800-1867 …but datasets have gotten so big, it’s not useful to publish them in hard copy anymore
  • 84. VO Sandpit, November 2009 Hard copy of the Human Genome at the Wellcome Collection
  • 85. VO Sandpit, November 2009 Creating a dataset is hard work! "Piled Higher and Deeper" by Jorge Cham www.phdcomics.com Managing and archiving data so that it’s understandable by other researchers is difficult and time consuming too. We want to reward researchers for putting that effort in!
  • 86. VO Sandpit, November 2009 Most people have an idea of what a publication is
  • 87. VO Sandpit, November 2009 Most people have an idea of what a publication is
  • 88. VO Sandpit, November 2009 Most people have an idea of what a publication is
  • 89. VO Sandpit, November 2009 Most people have an idea of what a publication is
  • 90. VO Sandpit, November 2009 Some examples of data (just from the Earth Sciences) 1. Time series, some still being updated e.g. meteorological measurements 2. Large 4D synthesised datasets, e.g. Climate, Oceanographic, Hydrological and Numerical Weather Prediction model data generated on a supercomputer 3. 2D scans e.g. satellite data, weather radar data 4. 2D snapshots, e.g. cloud camera 5. Traces through a changing medium, e.g. radiosonde launches, aircraft flights, ocean salinity and temperature 6. Datasets consisting of data from multiple instruments as part of the same measurement campaign 7. Physical samples, e.g. fossils
  • 91. VO Sandpit, November 2009 What is a Dataset? DataCite’s definition (http://www.datacite.org/sites/default/files/Bu siness_Models_Principles_v1.0.pdf): Dataset: "Recorded information, regardless of the form or medium on which it may be recorded including writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow, charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data." (from the U.S. National Institutes of Health (NIH) Grants Policy Statement via DataCite's Best Practice Guide for Data Citation). In my opinion a dataset is something that is: •The result of a defined process •Scientifically meaningful •Well-defined (i.e. clear definition of what is in the dataset and what isn’t)
  • 92. VO Sandpit, November 2009 What metrics do we use for our data?
  • 93. VO Sandpit, November 2009 Metric Breakdown CEDA numbers Notes Number of discovery dataset records in the DCS Quarterly NEODC 26 BADC 242 UKSSDC 11 Compliance with NERC data management policy. Reflects how many data sets NERC has. The number of dataset discovery records visible from the NERC data discovery service. Web site visits Quarterly BADC: 61,600 NEODC: 10,200 Active use and visibility of the data centre. Site visits from standard web log analysis systems, such as webaliser. Sensible web crawler filters should have been applied. Web site page views Quarterly BADC: 219,900 NEODC: 25,800 See web visits notes. Queries closed this period Quarterly 362 helpdesk queries 838 dataset applications Active use and visibility of the data centre. Queries marked as resolved within the quarter. A query is a request for information, a problem or ad hoc data request. Queries received in period Quarterly 388 helpdesk queries 860 dataset applications Active use and visibility of the data centre. See closed query notes. Data centre metrics – produced 15th July 2014
  • 94. VO Sandpit, November 2009 Metric Breakdown CEDA numbers Notes Percent queries dealt with in 3 working days Quarterly 84.06 (11.57% resolved after 3 days) 87.67 (10.23% resolved after 3 days) Queries receiving initial response within 1 working day Helpdesk - 93.57 % Dataset applications - 97.91% Responsiveness. See closed query notes Identifiable users actively downloading None Over year to date: BADC: 4065 NEODC: 362 Use and visibility of the data centre. An estimate of the number of users using data access services over the year. Number of metadata records in data centre web site None BADC: 240 NEODC:33 INSPIRE compliance. Reflects how many data sets NERC has. Number of datasets available to view via the data centre web site None (Metric in development) INSPIRE compliance. Usable services. Number of datasets available to download via the data centre web site None (Metric in development) INSPIRE compliance. Usable services. Data centre metrics – produced 15th July 2014
  • 95. VO Sandpit, November 2009 Metric Breakdown CEDA numbers Notes NERC funded Data centre staff (FTE) None 14 (estimate for FY 14/15) Data management costs. Efficiency. Number of full time equivalent posts employed to perform data centre functions. Direct costs of Data Stewardship in data centre None (reportable at end of financial year) Data management costs. Efficiency. Cost to NERC Capital Expenditure directly related to Data Stewardship at data centre None (reportable at end financial year) Data management costs. Efficiency. Direct Receipts from Data Licenses and Sales None £0 (CEDA does not charge for data) Commercial value of data products and services Number of projects with Outline Data Management Plans None (Metric in development) Means of tracking projects’ adoption of good DM practice. Outline DMP is at proposal stage Number of projects with Full Data Management Plans None (Metric in development) Means of tracking projects’ adoption of good DM practice. Full DMP is at funded stage Users by area UK 2534 61% Active use. Visibility of the data centre internationally. Percentage of user base in terms of geographical spread. Europe 494 12% Rest of the world 1024 25% Unknown 79 2% Users by institute type University 2934 71% Active use. Visibility of the data centre sectorially. Percentage of users base in terms of the users host institute type. Government 694 17% NERC 160 4% Other 277 7% Commercial 42 1% School 35 1%
  • 96. VO Sandpit, November 2009 Short answer: We don’t know!! Unless the data user comes back to us to tell us. Or we stumble across a paper which •Cites us •Or mentions us in a way that we can find • And tells us what the dataset the authors used was. This is why we’re working with other groups (like CODATA, Force11, RDA, DataCite, Thompson Reuters,…) to promote data citation. After the data is downloaded, what happens then?
  • 97. VO Sandpit, November 2009 How we (NERC) cite data We using digital object identifiers (DOIs) as part of our dataset citation because: • They are actionable, interoperable, persistent links for (digital) objects • Scientists are already used to citing papers using DOIs (and they trust them) • Academic journal publishers are starting to require datasets be cited in a stable way, i.e. using DOIs. • We have a good working relationship with the British Library and DataCite NERC’s guidance on citing data and assigning DOIs can be found at: http://www.nerc.ac.uk/research/sites/data/doi.asp
  • 98. VO Sandpit, November 2009 Dataset catalogue page (and DOI landing page) Dataset citation Clickable link to Dataset in the archive
  • 99. VO Sandpit, November 2009 Another example of a cited dataset
  • 100. VO Sandpit, November 2009 Another example of a cited dataset
  • 101. VO Sandpit, November 2009 Data metrics – the state of the art! Data citation isn’t common practice (unfortunately) Data citation counts don’t exist yet To count how often BADC data is used we have to: 1. Search Google Scholar for “BADC”, “British Atmospheric Data Centre” 2. Scan the results and weed out false positives 3. Read the papers to figure out what datasets the authors are talking about (if we can) 4. Count the mentions and citations (if any) http://www.lol-cat.org/little-lovely-lolcat-and-big-work/ We’re working with DataCite and Thompson Reuters to get data citation counts.
  • 102. VO Sandpit, November 2009 Altmetrics and social media for data? Mainly focussing on citation as a first step, as it’s most commonly accepted by researchers. We have a social media presence @CEDAnews - Mainly used for announcements about service availability We definitely want ways of showing our funders that we provide a good service to our users and the research community. And we want to be able to tell our depositors what impact their data has had!
  • 103. VO Sandpit, November 2009 RDA/WDS WG Bibliometrics Survey Results: Mostly Expected Citations are preferred metrics, downloads next. Standards are missing. Culture change is needed. 0 10 20 30 40 50 60 70 Nothing Data citation counts Downloads Social media (likes/shares/tweets) Mentions in peer-reviewed papers Hits in search engines Mentions in blogs Bookmarks in Zotero and/or Mendeley Other (please specify) 31.5% 68.5% Are the methods you use to evaluate impact adequate for your needs? Yes No What do you currently use to evaluate the impact of data?
  • 104. VO Sandpit, November 2009 Other projects in the data metrics space 1. CASRAI data level metrics 2. PLOS Making Data Count 3. NISO altmetrics 4. Jisc Giving Researchers Credit for their Data
  • 105. VO Sandpit, November 2009 Next steps for Bibliometrics for Data WG Will be based on: • WG survey results (presented RDA P4 and P5) • Spreadsheet of metrics being collected by repositories - Still open for contributions! http://bit.ly/1MpyW4K • Shared results from other projects – understanding the challenges and answering the questions posed in the case statement • Preliminary analysis of data DOI resolutions • Supporting and evaluating tools from other projects • Preliminary guidance for the community - “minimal” rather than “best” practice – get people discussing the issues and coming up with solutions!
  • 106. VO Sandpit, November 2009 Thanks! Any questions? sarah.callaghan@stfc.ac.uk @sorcha_ni http://citingbytes.blogspot.co.uk/ Image credit: Borepatch http://borepatch.blogspot.com/2010/06/its- not-what-you-dont-know-that-hurts.html “Publishing research without data is simply advertising, not science” - Graham Steel http://blog.okfn.org/2013/09/03/publishing-research-without-data-is-simply-advertising-not-science/
  • 107. VO Sandpit, November 2009 Title: Getting (and giving) credit for all that we do Melissa Haendel NISO Research Data Metrics Landscape: An update from the NISO Altmetrics Working Group B: Output Types & Identifiers 11.16.2015 @ontowonka
  • 108. VO Sandpit, November 2009 What *IS* “success”?
  • 109. VO Sandpit, November 2009 https://goo.gl/b60moX It’s not always what you see
  • 110. VO Sandpit, November 2009 What is attribution???
  • 112. VO Sandpit, November 2009 Over 1000 authors
  • 113. VO Sandpit, November 2009 Project CRediT http://projectcredit.net
  • 114. VO Sandpit, November 2009 Many contributions don’t lead to authorship BD2K co-authorship D.Eichmann N.Vasilevsky 20% key personnel are not adequately profiled using publications
  • 115. VO Sandpit, November 2009 Some contributions are anonymous Data deposition Image credit: http://disruptiveviews.com/is-your-data-anonymous-or-just-encrypted/ Anonymous review
  • 116. VO Sandpit, November 2009 The Research Life Cycle EXPERIMENT CONSULT PUBLISHDATA FUND
  • 117. VO Sandpit, November 2009 The Research Life Cycle EXPERIMENT CONSULT PUBLISHDATA FUND
  • 118. VO Sandpit, November 2009 • Measurement instruments • Continuing education materials • Cost-effective intervention • Change in delivery of healthcare services • Quality measure guidelines • Gray literature Evidence of meaningful impact • New experimental methods, data models, databases, software tools • New diagnostic criteria • New standards of care • Biological materials, animal models • Consent documents • Clinical/practice guidelines https://becker.wustl.edu/impact-assessment http://nucats.northwestern.edu/ Diverse outputs Diverse impacts Diverse roles Each a critical component of the research process
  • 119. VO Sandpit, November 2009 EXAMPLE OUTPUTS related to software: Outputs: binary redistribution package (installer), algorithm, data analytic software tool, analysis scripts, data cleaning, APIs, codebook (for content analysis), source code, software to make metadata for libraries archives and museums, data analytic software tool, source code, program codes (for modeling), commentary in code(thinking of open source-need to attribute code authors and commentator/enhancers/hackers, who can document what they did and why), computer language (a syntax to describe a set of operations or activities), software patch (set of changes to code to fix bugs, add features, etc.), digital workflow (automated sequence of programs, steps to an outcome), software library (non-stand alone code that can be incorporated into something larger), software application (computer code that accomplishes something) Roles: catalog, design, develop, test, hacker, bug finder, software developer, software engineer, developer, programmer, system administrator, execute, document, software package maintainer, project manager, database administrator Attribution workshop results - >500 scholarly products
  • 120. VO Sandpit, November 2009 Connecting people to their “stuff”
  • 121. VO Sandpit, November 2009 Modeling & implementation VIVO-ISF: Suite of ontologies that integrates and extends community standards
  • 122. VO Sandpit, November 2009 Credit extends beyond the original contribution  Stacy creates mouse1  Kristi creates mouse2  Karen uses performs RNAseq analysis on mouse1 and mouse2 to generate dataset3, which she subsequently curates and analyzes  Karen writes publication pmid:12345 about the results of her analysis  Karen explicitly credits Stacy as an author but not Kristi.
  • 123. VO Sandpit, November 2009 Credit is connected Credit to Stacy is asserted, but credit to Kristi can be
  • 124. VO Sandpit, November 2009 Introducing openRIF The Open Research Information Framework openRI F SciENc v eagle-i VIVO- ISF
  • 125. VO Sandpit, November 2009 Ensuring an openRIF that meets community needs Interoperability A domain configurable suite of ontologies to enable interoperability across systems A community of developers, tools, data providers, and end-users
  • 126. VO Sandpit, November 2009 Developing a computable research ecosystem Research information is scattered amongst: Research networking tools Citation databases (e.g., PubMED) Award databases (e.g., NIH Reporter) Curated archives (e.g., GenBank) Locked up in text (the research literature) Map SciENcv data model to VIVO- ISF/openRIF Enable bi-directional data exchange Integrate SciENcv, ORCID data into CTSAsearch http://research.icts.uiowa.edu/polyglot/ CTSAsearch: The Open Research Information Framework David Eichmann
  • 127. VO Sandpit, November 2009 Thank you! Join the Force Attribution Working Group at: https://www.force11.org/group/attributionwg Join the openRIF listserv at: http://group.openrif.org
  • 128. VO Sandpit, November 2009 Identifying those scholarly outputs Identifiers for things that are not publications, or documents, need to get beyond thinking about DOIs