Persistent identifiers (Pids) provide machine-actionable links to data and metadata that are vital to APIs (application programming interfaces) for publishing and citation. APIs are essentially request/response patterns that use Pids to reference things and metadata to describe not only the things themselves, but also any actions requested or taken. As a result, metadata design and standardization is wedded to API design and enhancement. With Pids as nouns and metadata as adjectives and qualifiers, Pid services play a key role in API implementation.
2. 2
Decoding the title
persistent identifier services and their metadata
|| || ||
things, actions, and descriptors
|| || ||
nouns, verbs, and adjectives
||
preserving and serving scholarly communication around data
(Context: scholarly research data)
2
4. 4
An identifier is not a string of characters
An identifier is an association between a string and thing.
An association is an opinion asserted by an authority.
Example 1:
http://allrecipes.com/recipe/sauteed-fiddleheads
Example 2:
4CF3-57AB-2481-651D-D53D-Q
4
http://dx.doi.org/10.5072/4CF3-57AB-2481-651D-D53D-Qhttp://dx.doi.org/10.5240/4CF3-57AB-2481-651D-D53D-Q
5. 5
Identifier schemes (v1)
• URL (Uniform Resource Locator)
• the first time poor id management is blamed on syntax
• URN (Uniform Resource Name)
• first attempt to correct poor id management with syntax
• Handle
• second attempt to correct poor id management with syntax
• DOI (Digital Object Identifier)
• third attempt to correct poor id management with syntax
• ARK (Archival Resource Key)
• attempt to let id management be queryable (not yet realized)
5
6. 6
Identifier schemes (v2)
• URL (Uniform Resource Locator)
• world’s first actionable id, now underlying all other types
• URN (Uniform Resource Name)
• open infrastructure, not fully realized globally
• Handle
• closed infrastructure, fully realized globally
• DOI (Digital Object Identifier)
• CrossRef enforces good id management, DataCite learning
• ARK (Archival Resource Key)
• open infrastructure, realized locally and globally
6
7. 7
If DOIs won why talk about non-DOIs?
• Cost
• Open access
• Changing nature of the DOI
• Flexibility
7
8. 8
Types of identifier services
• Repository – parking the bits
• Data-aware dissemination
• more than just returning parked bits
• Citation management for end user researchers
• Research tracking – measuring use and impact
• Identifier creation, management, and resolution
8
9. 9
Many service tools, many APIs
Repository Tools
• ArXiv *
• Dataverse *
• Fedora/Hydra
• Dspace *
• Eprints
• DataONE
• Merritt/Stash
• figshare
• Zenodo
9
Citation Management
• Mendeley
• Zotero
Metrics and Tracking
• Altmetric
• Impactstory
• Thomson Reuters Data Citation
Index
• Elsevier Scopus
10. 10
API concepts
Application Programming Interface (API)
• how software talks to a service
• unlike a Graphical User Interface (GUI)
• more like a Command Line Interface (CLI)
APIs and CLIs use language constructs
• Verbs, nouns, and qualifiers are "words”, and
• words form commands/requests/responses,
• which form scripts and programs.
10
11. 11
APIs are metadata sentences
A command line interface powering an API interaction
11
$ sort mydata > sorted_data
$ grep Smith sorted_data
Smith, Sally 2014-04-01 406B
Wong, Frank 2013-11-28 334
$ wget --user=sam --no-check-certificate
"https://n2t.net/a/ezid/b?set cost 25.50"
status: ok
12. 12
Problem: traditional standardization
• Change by committee is ugly, costly, and slow
• Example: Dublin Core, 15 cross-domain terms
12
European Parliament Technology - DG ITEC @ flickr
18. 18
An alternate metadata universe
• Vision: one dictionary, one namespace
• All research domains, any part of “metadata speech”
• Names, values, units, relationships, ...
• Search for terms, comment on terms, add terms, edit
your terms, API for automated access
• All terms with globally unique persistent identifiers
• Available at yamz.net (yet another metadata zoo)
18
19. 19
YAMZ.net dictionary sociology
• Crowd-sourced evolving vernacular terms, stable
canonical terms, and deprecated terms
• Use evolving terms depending on your risk tolerance
• Reputation-based (gaming-resistant) voting means
strong terms rise, weak terms decline
19
Applying lessons learned from Wikipedia, the
Internet-Draft/RFC process, and StackOverflow
20. 20
Summary
• Identifiers are not strings, but associations that break
when things are not managed well
• People can forget names because we can google, but
APIs need persistent names for automation at scale
• APIs are languages using metadata as “words”
• Future API building will focus on vocabulary building
• For example, yamz.net
20
Thank you!
John.Kunze@ucop.edu