Oliver Pesch presentation on SUSHI and IOTA projects, part of the LITA/ALCTS Electronic Resources Management Interest Group meeting held at ALA Midwinter in Seattle, WA, on January 27, 2013
Separation of Lanthanides/ Lanthanides and Actinides
Meeting the Challenge / NISO update
1. Meeting the Challenge: Successful
Electronic Resources Management in the
Absence of a Perfect System
NISO Update on IOTA and SUSHI
Oliver Pesch
Chief Strategist, E-Resources, EBSCO Information
Services
4. SUSHI
WHAT IT IS ...
An ANSI/NISO Standard (NISO Z39.93-2007)
Defines automated request and response model
for harvesting e-resource usage data
Designed to work with COUNTER, the most
frequently retrieved usage reports
5. SUSHI
HOW TO USE IT …
Works behind-the-scenes
It is a client-server technology used by usage
consolidation solutions (e.g. ERM systems) and
content providers
Content providers develop a SUSHI Server to
deliver COUNTER statistics
Usage consolidation solutions include a SUSHI
client to automatically retrieve usage on a
scheduled basis or on demand
6. SUSHI
WHY YOU SHOULD USE IT …
It replaces the time-consuming user-mediated
collection of usage data reports
The protocol is generalized and
extensible, meaning it can be used to retrieve a
variety of usage reports
7. SUSHI
CURRENT STATUS…
Many resources available on SUSHI web site:
http://www.niso.org/workrooms/sushi
40+ content providers support SUSHI
(SUSHI Server Registry: https://sites.google.com/site/sushiserverregistry)
Works with all COUNTER reports
Ready for COUNTER Release 4
SUSHI support is an enforced requirement for
COUNTER compliance with Release 4
8. SUSHI
THE COMMITTEE…
Bob McQuillan, Innovative Interfaces Inc. (Co-chair)
Oliver Pesch, EBSCO Information Services (Co-chair)
Marie Kennedy, Loyola Marymount University
Chan Li, California Digital Library
John Milligan, ScholarlyIQ
Paul Needham, Cranfield University
James Van Mil, University of Cincinnati Libraries
9. SUSHI
CURRENT ACTIVITES…
◦ Continued education and awareness
◦ Renovating the web site
◦ Exploring “SUSHI Lite” – a protocol that
would be based on JSON
11. IOTA
WHAT IS IT…
◦ A working group focused on OpenURL
quality…
◦ Using analytics to provide a quantitative measure of
quality of OpenURLs provided by “Sources”
◦ Created the Completeness Index as a measure of
quality
◦ Developed an interactive online tool to provide
analysis and reporting on real OpenURL log file
◦ Producing a Technical Report and
Recommended Practice related to OpenURL
quality
12. IOTA
COMPLETENESS INDEX…
Based on premise that the success of a link can
be affected by the data provided in the
OpenURL
Identify the required metadata elements
Determine a “weight” for each element to
reflect importance
Score an OpenURL by adding weights for all
elements provided divided by the total if all
elements appeared
13. IOTA
Simple example assuming equal element weights
Element Description Weight This OpenURL
ATitle Article title 1
AuLast Author’s last name 1
Date Date of publication 1
ISSN ISSN 1
Issue Issue number 1
SPage Start page 1
Title Journal Title 1
Volume Volume number 1
TOTAL 8
14. IOTA SAMPLE OPEN URL DATA
?date=2/4/2008
&issn=1083-3013
Simple example assuming equal element weights
&volume=13
&issue=20
Completeness Score...
&atitle=the+casualties+of+war
Element Description Weight This OpenURL
(Total for This OpenURL)
Total Weights
ATitle Article title 1 1
AuLast 5 / 8Author’s last name 1
Date 1
= .625 of publication
Date 1
ISSN ISSN 1 1
Issue Issue number 1 1
SPage Start page 1
Title Journal Title 1
Volume Volume number 1 1
TOTAL 8 5
15. IOTA
RECOMMENDED PRACTICE…
Defines a technique for determining element
weights
Tested with real link resolvers and real
OpenURLs
Based on research which looked for a
correlation with data elements on the
OpenURL and “success” of the OpenURL
16. A Statistical Approach to
Determining Element Weights
Select a set of “perfect” OpenURLs
◦ include all key data elements and resolve to full
text
Perform step-wise regression
◦ Test failure rates for each element by removing
that element
Use failure rates as basis for weights
Use weights to calculate Completeness
Scores and to test for correlation between
weights and success for larger sample
17. Failure Rates from 1500 OpenURL
test sample
Author’sElement removed
last name is least Description Failure Percentage
important OpenURL
from the
ATitle Article title .74%
Date is AuLast
surprisingly low Author’s last name .07%
Date Date of publication .4%
ISSN ISSN (either online or 22.02%
print ISSN)
Issue Issue number 20.27%
SPage
Volume is most critical Start page 33.27%
Title Journal Title (either .61%
Title or Jtitle)
Volume Volume number 74.14%
18. Calculated Element Weights
Element Description Weight*
ATitle Article title 1.87
AuLast Author’s last name 0.83
Date Date of publication 1.61
ISSN ISSN (either online or 3.34
print ISSN)
Issue Issue number 3.31
SPage Start page 3.52
Title Journal Title (either Title 1.78
or Jtitle)
Volume Volume number 3.87
*Element weight calculation: log10 (failure-rate-per-10,000 OpenURLs)
19. Results
1.2000
1.0000 Average of
0.8000 Completeness
0.6000 Score
0.4000
0.2000
Average of
Success Score
0.0000
Correlation Coefficient .80
Tests conducted on sample of 15,000 OpenURLs randomly pulled from IOTA database
20. IOTA
INTERACTIVE ONLINE TOOL…
23.3+ million OpenURLs processed
Reporting interface
◦ Analyze data elements (metrics) across vendors or
database (Source)
◦ Analyze (Source) for all data elements
24. IOTA
HOW TO USE IT…
◦ The Technical Report provides suggestions for
improving OpenURLs
◦ The interactive tool offers a means to pin-
point irregularities in data provided on
OpenURLs
◦ The Recommended Practice describes how to
create a Completeness Index
◦ Completeness Index allows OpenURL quality
problems to be quantified
25. IOTA
WHY YOU SHOULD USE IT…
◦ Link resolver vendors can implement the
Completeness Index in their products to help
identify problematic OpenURL sources
◦ Librarians can use suggestions and
Completeness Index to more effectively
communicate quality problems to content
providers
◦ Content providers can use the online
interactive tool to identify problems with the
data they provide
26. IOTA
THE WORKING GROUP…
Adam Chandler (Chair)
Database Management and E-Resources Librarian, Cornell University Library
Rafal Kasprowski
Electronic Resources Librarian, Rice University
Susan Marcin
Licensed Electronic Resources Librarian, Continuing & Electronic Resources
Management Division, Butler Library Columbia University
Oliver Pesch
Chief Strategist, E-Resource Access and Management Services, EBSCO Information
Services
Clara Ruttenberg
Electronic Resources Librarian, University of Maryland
Elizabeth Winter
Electronic Resources Coordinator, Georgia Tech Library, Collection Acquisitions &
Management Department
Jim Wismer
Manager, Software Engineering, Thomson Reuters
Aron Wolf
Data Program Analyst, Serials Solutions
27. IOTA
CURRENT STATUS…
◦ Technical Report in final draft
◦ Recommended Practice has been submitted
to NISO
◦ Interactive Online Tool remains available
28. Active NISO Initiatives
DAISY Standards
Demand-Driven Acquisition (DDA) of Monographs
Digital Bookmarking and Annotation
E-book Special Interest Group (SIG)
IOTA: OpenURL Quality Metrics
I2 (Institutional Identifiers)
ISO Project 25964
JATS: Journal Article Tag Suite (Also known as Standardized Markup for Journal Articles)
KBART (Knowledge Base and Related Tools) (NISO/UKSG)
NCIP (NISO Circulation Interchange Protocol) Standing Committee
Open Discovery Initiative
PIE-J (Presentation & Identification of E-Journals)
ResourceSync
SERU Standing Committee
Standard Interchange Protocol (SIP)
Supplemental Journal Article Materials (NISO/NFAIS)
SUSHI Standing Committee and SUSHI Servers
Z39.7 (Data Dictionary) Standing Committee
29. References
Active NISO Groups
http://www.niso.org/workrooms/#active
SUSHI Web Site
http://www.niso.org/workrooms/sushi
IOTA Web Site
http://www.niso.org/workrooms/openurlquality
SUSHI Server Registry
https://sites.google.com/site/sushiserverregistry
30. Have an idea for a standard or
recommended practice?
Email…
Nettie Lagace,
Associate Director for Programs, NISO
nlagace@niso.org
THANK YOU!
Notas do Editor
Lets run through a quick example. This table shows the core elements for an article link… and for the simplicity of this example we will assume all elements are equally important so each gets a weight of 1 – a perfect OpenURL will get the maximum score of 8.
Now lets look at some OpenURL elements…. In this OpenURL we have…<CLICK>Date … so we add one point<CLICK>ISSN… add another point<CLICK>Volume… another point<CLICK>ISSUE… another<CLICK>And Article Title… and another point<CLICK>… the result is a total of 5 points.<CLICK>The calculation is Sum of the weights for this OpenURL divided by the total for all weights<CLICK>Which is five divided by 8<CLICK>Or .625
We needed a better way of determining the element weights, so we sought help from Phil Davis – a researcher with some experience in statistical modeling. Phil’s suggestion was to perform stepwise regression to see the effect of individual elements on a sample of OpenURLs. And that is what we did…We started with a set of “perfect” OpenURLs – ones that not only included all core data elements, but that also resolved to match a full text target on both LinkSource and 360 Link… we used a set of 1500.<CLICK>We then ran several series of tests where we ran the OpenURL past the link resolver with a different element removed for each test series.<CLICK>We recorded the success (or rather failure rates) associated with each element. The elements with the higher failure rates are more important to the success of the OpenURL than the ones with lower failure rates.<CLICK>We then used the failure rates as a basis for weights.<CLICK>Then we used the weights and re-ran our 15,000 sample test.
So how’d it turn out? Again, here are numbers for LinkSource.<Click>You can see Volume was a key element with 74% of OpenURLs failing when it was removed.<Click>Author last name was not very important with less than a 10th of a percent failure rate<Click>Date was surprising low too. This could be for a few reasons – the level of forgiveness in the holdings matching logic (e.g. treat no date as “any date”), the ability for the link resolver to discover the date by looking up the article citation in the knowledge base using volume/issue/start page coupled with the fact that a lot of full text providers don’t use date explicitly in the outbound links.
We created article weights. <Click>Rather than use raw failure rates, we used logarithmic values of the failure rates – the number of failures per 10,000.
Then we ran our 15,000 record sample again. You can see from the graph that average completeness score and average success score for the OpenURL providers align very closely, and the Correlation Coefficient of these two values across all 15,000 test OpenURLs is .80 – which indicates a strong correlation. Good news for the test.This tells us that the Completeness Index can be used as a predictor of OpenURL success from a particular content provider – a low Completeness Index is a good indicator there is a problem.