Ringgold is one of several organizations that are putting forth ideas to standardize data and data exchange throughout scholarly publishing. This session discussed new initiatives that address such challenges as standardizing conflict of interest reporting, easily identifying funding sources, clarifying contributor roles for research papers, and managing institution disambiguation.
6. *This means data that can be linked together through
unambiguous identification and exchanged with others
Governed
Trusted
Transparent
And contain appropriate metadata
In order to be effective, identifiers must be:
7. Persistent numeric or alpha-numeric
designations associated with a single entity
Entities can be an institution, person, or piece of
content (People, Places, & Things)
1. Disambiguate, aka enforce uniqueness
2. Enable linking, aka data integration and interoperability
In other words, they provide a simple
basis for data governance
8. ◦ Break down silos
◦ Keep data current and
synchronised
◦ Enable staff to interact
with data more effectively
◦ Simplify data exchange
◦ Improve overall data
quality
Institutional
Identifiers
CRM
Electronic
document
storage
Usage
statistics
Author
Database
Fulfilment
system
Membersh
ip system
License
Validation
Manuscript
Submission
System
9. • Resources & personnel required to join existing
records to IDs or an authority file
• Build customized solutions mapping systems
together
• Improve data capture to require an ID upon record
creation
• Manual vs programmatic cost-benefit questions
• Design new reporting and analysis tools to
leverage newly linked datasets
10. Researchers – create Current Research Information
Systems (CRIS) – one portal to figure out how to
best conduct research, who to work with, who will
fund it, what else has been contributed to the
subject thus far, where is the best equipment to
help further the research.
Funders – Want to track areas of interest, identify
worthwhile pursuits, and see where their money
goes.
Institutions – Demonstrate research output more
accurately and precisely describe the institution’s
contribution and who is affiliated with that work.
Publishers – Facilitate transactions of all types from
content discovery to delivery of author royalties.
Improved market analysis and targeted advertising.
11. ISO Standard 27729
ISNI is designed to be a
“bridge identifier”
Covers any type of entity
ISNI Number ISNI Number
Party ID 2Party ID 1
Proprietary
Information and/or
Metadata
Proprietary
Information and/or
Metadata
12. In cooperation with ProQuest, OCLC, and
other public and commercial entities,
Ringgold has been working to map ISNIs
to deeper datasets for the past two
years.
It’s taken time due to the problems with
the raw source data, and the policies for
assignment of the unique ISNI identifier.
13. At the same time ISNI records are loaded
to the Ringgold Identify Database we will
being issuing ISNIs for institutions.
ProQuest (Bowker) is a Registration
Agency as well, focusing on individuals.
17. It was a desire to “help” authors differentiate and disambiguate
themselves that got ISNI started.
Along the way, a lot has been learned. A specific example, that
often doesn’t get a lot of attention, is the need for privacy
protection whenever there is an Identification process underway…
this holds true for individuals and institutions.
Our industry spends a great deal of time discussing “open data”,
but there are many times when that data should not (or cannot) be
made public (physicist romance author, animal tester, military
applications, etc….)
18. The Semantic Web cannot exist
without well structured data
Things take on a life
of their own
Vastness
Vagueness
Uncertainty
Inconsistency
Deceit
The challenges to creating a world
of content tagged with meaning:
Standard Identifiers can help with the
middle three – Artificial Intelligence
will handle Vastness and Deceit
Let’s take a moment to orient ourselves on the big picture…
Our trees are interesting! Publications, vendors, authors… all the people places and things can be described using standard taxonomies and identifiers.
This aerial view of our forest home provides a bit more perspective – but we’re really headed to a place where we can use standardized descriptions to develop new information
Here, we’ve virtualized our understanding of the world by using data ---from this perspective, not only can we look at things far beyond our immediate sight, but are able to view our surroundings in different contexts and with much deeper analysis than by simply looking around– this is where we are headed when looking at the universe of people, the world of places, or all the stuff in it. We used to look at long lists of people we though might be customer, those that were already customers… and now we have ways to better understand who is really using our content, who is funding the most highly accessed research, and who’s are the individuals and institutions involved?
You’ll note that I’ve used the term “Standard Identifiers” as opposed to just “Standards”… I’ll be focusing on using standard identifiers as the main data hooks that will allow us to aggregate information for the purpose of synthesizing knowledge.
Interoperability implies communication; how we communicate something is very different than how we describe things.
I should take a moment to clarify that the Ringgold ID is not a standard – not an ISO Certified standard, in any case, but in many cases, our data has become a defacto standard through application; some of you might be wondering what then constitutes a big “S” Standards – if any system uses a predefined taxonomy as an authority file to validate data (thereby achieving identify data entries for each and every instance it is needed) then a standard has been achieved.
How data is exchanged is quite different than the data itself, and of course, standards may be applied to both. For my part, I’m going to talk about the data itself, not how it is exchanged– So, in terms of the data itself, what are we trying to standardize? Descriptions – the wrapper around highly unique content. More importantly, as an industry—as a species, really—we are creating data elements that can be interpreted by machines – I should say, easily interpreted by machines – I’ll come back to this topic near the end of my presentation.
INTERNAL – Let’s look at your own ecosystem.
Linking of data: Enable staff to use your data more efficiently, and keep the same view of an institution regardless of what system they are using. See overlaps and outliers when comparing two or more datasets.
Example 1: Compare your fulfillment – active subscriber list – w your doc storage system, and see which subscribers have never submitted their license agreement.)
Example 2: We’ve got a client that uses 3 systems to take and fulfill institutional subscriptions: CRM, authentication, and an accounting platform. Before linking these systems up with identifiers, there were disconnects that affected their clients: sometimes it was impossible to tell why the auth system was granting journal access to a particular institution – the access seemed unconnected to the payment.
Loads of benefits: IF STANDARDS ARE INTEROPERABLE
Bridge Identifier – this is an extremely important concept– there are identifiers, and there’s data… and while identifiers are data, not all identifiers operate or are maintained in the same way, and this is the important difference between and ISNI and a Ringgold ID.
Mention that we are now board members (Laura).
(ISNI straddles persons and institutions, so this will make a nice segue.)
INTERNATIONAL STANDARD NAME IDENTIFIER, Iso standard.
SCOPE: ISNI is meant to identify all things considered to be public parties, mostly which are creators of content, or otherwise appear in library & union catalogs (including fictional characters?). Typical records hold name variants, as you can see here. It is not limited to the scholarly or research sector, but covers all manner of popular authors, musicians, and contributors. (Original ISNI dataset was populated with VIAF records & other bibliographic sources like the Library of Congress and other international sources.)
Our relationship: RIN is an ISNI registration agency, which means we will be working as a conduit for new record creation within our scope, which is primarily institutions in the scholarly supply chain. It is our plan to hold ISNIs for all institutions in our Identify database, and we are now we are working to ensure that all ISNI records which map to RINs are correct, and that we can achieve clean one to one matches. We are also working with them to create new ISNI IDs for insts in RIN, but that are not yet in ISNI. By mapping our database completely to theirs, we hope to put our clients at the starting line, so that our clients may maximize their supply chain linking.
To look at a few specific records: Here’s an ISNI, but an institutional record rather than the personal record we saw earlier. Again, note all the name variants as they appear in library holdings records.
I should mention that this record illustrates one of the biggest problems everyone is confronted with--- the Many-to-One (One “Golden Record” as one major publisher refers to their internal authorative record). Here we have many names for the same institution… all attributed to the 1 ISNI – this is not ulinke what Ringgold does– we have ‘alternate names’ for each Ringgold ID stored within the Identify database, and by the end of June, the ISNI names will also be linked to the Ringgold ID (mostly… there’s not a 100% 1|1 match between ISNI and Ringgold… that’s another story.