2. Agenda
• Introduction
• Updates from Ringgold
• Persistent IDs - Interoperability and initiatives
• Curated IP data and Identify Online
• Data enhancements - update
• Identify Online – the new interface
• Feedback Workshop and Q & A
• Wrap up
• Drinks
3. Tying things together
Ringgold’s Mission
to provide identifiers and
structured data to power the
efficient exchange of
information throughout the
scholarly research community
...and beyond.
4. Introduction
• New Customers:
• IEEE
• OCLC
• Peter Lang
• EDP Sciences
• Mary Ann Leibert
• BioOne
• Modern Language Association
• American Institute for Aeronautics and Astronautics
• Ringgold will be at the following conferences in early 2018
• PIDapalooza, Girona Spain, January
• University Press Redux, London, February
• R2R Conference, London, February
• UKSG, Glasgow, April
• London Book Fair: The Faculty in association with ALPSP, April
7. Changes in the landscape for Organization IDs
• ISNI
• Becoming more open, separating the Org IDs from the Person IDs
• Supports JISC CASRAI report
• Response to community needs, not least for Open Linked Data
• OrgID group (CrossRef, ORCID, Datacite)
• GLEIF – applications for Ringgold customers?
• Did you know?
• Open Funder Registry IDs – useful for APC automation
• IPEDS are linked to ARL library expenditure statistics
• Over to you
8. ISNI
ISNI Number ISNI Number
Party ID 2Ringgold ID
Ringgold
Hierarchies
and Metadata
Open or
Proprietary
Information
9. ISNI Plans for Organization IDs
• Segmentation of organization records (650,000+) and respective ISNI IDs
• Searchable user interface
• ISNI Organization IDs and core metadata available under a CC0 licence
• Regular data downloads of entire ISNI Organizations Registry
• API for retrieval and resolution of ISNI Organization IDs
• An online form for organizations to supply updates to their own record
• A new Advisory Board with representatives from the scholarly
communications community
10. Changes in the landscape for Organization IDs
• ISNI
• Becoming more open, separating the Org IDs from the Person IDs
• Supports JISC CASRAI report
• Response to community needs, not least for Open Linked Data
• OrgID group (CrossRef, ORCID, Datacite)
• GLEIF – applications for Ringgold customers?
• Did you know?
• Open Funder Registry IDs – useful for APC automation
• IPEDS are linked to ARL library expenditure statistics
• Over to you
11. Over to you…
•What else would be useful to you?
• Data we could ingest?
• Things that are open and have an API?
• Data we could map to?
• Things like Funder IDs where we can hold the ID for you to
link to the main data internally
• Where should be striving to get Ringgold IDs used?
14. Data enhancements: areas of growth
Growth seen across all countries: Growth seen across all sectors:
US
43%
IN
7%
DE
6%
GB
4%
CN
3%
CA
3%
JP
3%
FR
3%
BR
2%
AU
2%
ROW
24%
Top Countries 2017
16. Data enhancements: maintenance (how)
• Production department
• 8 permanent members of staff
• 44 dedicated researchers
• based in 35 countries
• expertise in 50+ languages
• Production system
• developed in house
• continually enhanced
17. Data enhancements: new data mappings
• Crossref Open Funder Registry
• v1.13 released Oct 9th 2017
• Mapped >14k Funder IDs to Ringgold records
• Available via:
• Weekly exports
• API
• Identify Online
18. Data enhancements: Libraries
• Previously libraries were stored as alternate names of main parent
• Now attempt to replicate the ‘real world’
• Admin relationship to department
• Library relationship to main library
• New types: Publisher types – real world
• academic/publisher
• corporate/publisher
• other/publisher
• other/learnedpublisher
19. Data enhancements: Places
• Used Google API to enhance existing
Ringgold location information
• Currently:
• City
• State
• Country
• Enhanced:
• Name
• Administration Level (1…..5)
• Long / Lat
• Multi-language
• Partial release into Identify Online
• Full release 2018 alongside existing data
20. Data enhancements: improved access
• API
• Upgraded to version 2.5
• Includes OFR data
• 4 methods
• getInstitution
• findInstitutions
• findInstitutionsByKeywords
• getInstitutionFamily
• Delta export
• Weekly export containing only changed records
• JASON and XML formats
22. Highlights
• Faster and more intuitive
• Powerful new metadata filters
• Hierarchy display and export
• Find direct and indirect customers
• Supports Boolean (AND, OR, NOT), wildcard (*)
and fuzzy (~) operators, and exact phrases (“ “)
• Refine, filter, sort and download your search results
• Save your search to review later or share with colleagues
• Simple Search to quickly find records: by name or Ringgold ID
• Searches on native spellings and alternative names
23. Highlights cont…
• Advanced Search Enables detailed searching and prospecting:
• Search against any field in the Identify Database
• Use the new Academic Filters for targeted results
• Search & refine by subscriber and product detail
• Datasets:
• Combine and compare Advanced Search queries and/or existing Datasets
• Create new datasets, save and share with colleagues
• For example:
• Which customers take journal A and B but not C?
A
CB
25. Ideas Workshop and Q & A
• What do you want to see from Ringgold?
• How are you using Ringgold’s Services?
• What business challenges are you facing that could be
met with data-driven solutions?
• Question and Answer session
Because the only thing that is certain as we develop is that our data about people/places/things, and the content itself, will continue to be created, transmitted and stored in a digital infrastructure, and be intercepted by a growing number of systems – tying those systems together and creating uniform data through the chain. We will want to analyze our data, and parse our content in more granular ways, which will only be effectively done if we future-proof it with the use of proper identifiers and metadata.
Informal get together with our customers to talk about new developments at Ringgold, explore applications of the Identify database, and
We are still very much about interoperability and the supply chain, from researcher to reader. Ringgold is now connected to more identifiers than ever, IPEDS, PSI, OFR, NCES, Athens, ORCID, and through ISNI to everything else in our space. And we are always going to consider more. But more of that in a moment. We continue to work with more organisations at different stages of the research life cycle and anticipate that the growing needs of a changing environment will mean that identifiers, hierarchies and metadata will become more important throughout – from research management to impact analytics.
Press release, sent in August, explain what the plan is and why its important
Academic research community has displayed a requirement for an open OrgID that they can use, share and so forth, in support of the JISC CASRAI report, the ORG ID group, and ISNI members such as universities which are particularly interested in mapping author affiliations and in Linked Open Data. Core metadata will be name, variant names, location information
Tim will let me know what else I can say.
OrgID group RFI – deadline was 1 Dec. Don’t know what is happening with it, seems in flux and undecided at present, will know more later in the month or January.
GLEIF – Global Legal Entity Identifier Foundation – comes from financial industry but interested in whether there are use cases for scholarly comms. Had a few chats with them, would be interested to know if there is value to publsihers?
Currently: 480k
New records 2017: +38k (8.5k libraries)
500k records Q2 2018
Country Organisations US 15391 IN 2683 DE 2126 GB 1501 CN 1162 CA 1092 JP 1001 FR 981 BR 820 AU 814 ROW 8500
Ringgold Type Organisations academic/library 6515 corporate/pharma 1776 academic 1629 corporate/serv 1550 school 1324 corporate/medsupport 1309 corporate/medprac 918 hospital 859 other/edu 855
Aim to update 100k records / year
Focussed: e.g. NHS England, Full hierarchy reviews, URL Ping tests, Times Higher Education reports, Korean postcode format changes etc.
Timestamp: Work on full review of older records
Duplicates
Spellings
Map to other sources of data helps maintain our records
Ringgold: +14k Crossref: 15,441 active funders. Difference: duplicates, IDs reassigned, insufficient information to find web presence:
Helene Morgan Babcock and Alfred Babcock Memorial Scholarship Trust
http://data.crossref.org/fundingdata/funder/10.13039/100010236
Ian Hames from PSI is going to present
The algorithm for matching our new Place records with Google Places API (GPA), involved:1. creating a place_key (aka search key) composed of city, state, country_code. from orgs and altnames.2. running a batch processor that called GPA/autocomplete to give us predictions as to the possible and probably Google places.3. if a single Local or Sublocal entity was predicted, we added the Google Place ID (GPI) to the place_key and moved to step 4 to read and compare the details. If more than one local prediction returned from GPA, we saved up to 5 predictions and stopped, unable to assign one as the preferred choice.4. for each GPI received, the batch processor called GPA/getDetails for the GPI and compared that record to our Place and updated our incorrect fields.5. for each GPI received, we compared the received language to the 53 possible languages and the batch process requested details on each GPI+language combination creating alternate language versions of our place records.6. using the place_keys created in step 1, we linked the original org. or altname record to the place_key, to find the Ringgold Place Id, (aka PID) and appended that to the org. or altname record.7. If the autocomplete returns a sublocal entity, we store that and go back to step 2 to search the local_name given in the sublocal record. Our hope was to auto resolve sub-localities like Brooklyn, NY, US to NY, NY, US and store both entities as a places hierarchy.8. The batch process runs on our RingNode server using AWS message queue to hold requests for autocomplete, getDetails, and getLanguageDetails.
Numbers:
API – 8k calls per hour
Licences – 30
Delta – 9 clients signed up to download
Searches on both native spellings and alternative names
Use datasets to compare 2 or more distinct sets of data; Combine (de-dupe) from searches or datasets; exclude specific values from a query; Create complex interrogations of your data.
Dataset operators: UNION, joins sets and eliminates duplicates
INTERSECT; displays records common to 2 or more sets
DIFFERENCE; finds records that are unique to jined sets
FILTER: refines a dataset on specific criteria