2. Introduction
Web Scale Discovery in brief & why it matters
Metadata – new ruler of the realm
Life Cycle of Metadata – Publisher as Parent
Evangelic Appeal for Standards
Strategies, Tactics & Pitfalls to Avoid
3. Many terms tossed around…
Federated search, Metasearch, NextGen catalogs, discovery
layers --- and now “Web Scale Discovery Service”
An improved search experience has always been the
motivation behind innovation…
The latest generation of tools are something different.
4. A Definition
A pre-harvested central index coupled with a richly featured
discovery layer providing a single search across a library’s
local, open access, and subscription collections.
…but it’s more than that
5. Not Just Another Search
PDA/DDA are purchasing models that were ahead of
technologies ability to properly accommodate. The
acquisition systems developed in conjunction with WSD
represent a logical progression of capabilities
Patron-driven acquisition, or PDA, is not new, but it is on the
rise. Approximately 400 to 600 libraries worldwide have
switched to a patron-driven system for purchasing new
works, and that number is likely to double over the next year
and a half (2012)
8. Content is King?
Metadata is the real ruler of the realm
Using descriptions of content to generate purchase and use
is more important now than ever
So, if we know what the target is, how do we create the best
possible metadata?
9. The Black Box
The people who know how these systems work aren’t telling
11. The Basics (More Is Better)
Title
Author
Format
ISBN
Subject categories
Imprint
Link to publisher’s dedicated page
Publication Date
Price
12. Data = Sales
Titles that meet the BIC Basic standard see average sales 98%
higher than those that don’t meet the standard
Records with complete BIC Basic data but no image have
average sales…of 473% [higher] in comparison to those
records which have neither the complete BIC Basic data
elements or an image.
The difference in average sales between records which…
don’t have enhanced metadata, and records which do…have
enhanced metadata elements is on average over2,600 units,
which represents an increase of almost 700%
14. How identifiers help
Proper understanding of the customer, whether author,
reader or institution
Provides a simple basis for wider data governance:
Data governance, as defined at Ringgold, is the processes,
policies, standards, organization, and technologies required
to manage and ensure the availability, accessibility, quality,
consistency, auditability, and security of data.
15. The supply chain
Consortium
Author
Submission
and Peer
Review
System
Publisher
Technology
Partner
Subscription
Agent or
Sales Agent
Fulfilment
House or
System
Library
Discovery
Service
WSDs
End User
Data
Syndication
Targets
Consortium
Societies
FundersCitation
16. The supply chain
Consortium
Submission
and Peer
Review
System
Technology
Partner
Subscription
Agent or
Sales Agent
Fulfilment
House or
System
End User
Consortium
Societies
FundersCitation
20. Strategy Suggestions
Create the most complete metadata possible
Distribute widely and efficiently
Adhere to standards
Uniquely describe each manifestation of a work
Develop an internal policy to create uniform data across all
published works
21. Practical Tactics
Require Authors to establish an ORCID profile
Create links into content, the more specific the better
Develop concise descriptions of content (not jacket copy)
Include as much as practical – e.g. abstracts of chapters are
often written by the authors themselves
Apply unique identifiers to establish longevity of the
metadata (e.g. ORCID, ISBN, ISSN, DOIs Ringgold ID, ISNI)
Evaluate the benefits of working with outside partners to
assist in metadata development, application and syndication
22. Pitfalls to Avoid
Non-Standardised Naming Conventions
Result: Poorly associated data in the supply chain.
Example 1: Inconsistent author listings, e.g. John Smith, J Smith,
Smith J etc.
Solution: use ORCID numbers
Example 2: Lack of affiliations between authors and institutional
customers.
Solution: use the Ringgold or ISNI number
Example 3: Inability to link author and customer data together.
Solution: use the Ringgold or ISNI number
23. Pitfalls to Avoid (continued)
Lack of or Inadequate Subject Classifications and Keywords:
Result: Dramatic negative effect the positioning of content in
relevancy rankings in discovery or search services
Example 1: Applying non-standard subject classifications causes a
mismatch against what is expected by libraries or end-users
Solution: Understand the standards and best practices being applied by current
systems and similar publishers; provide information in a form that will most
easily utilized by the systems presenting your data
Example 2: DDA sales are lost because subjects were applied without
using an international standard resulting in poor search results among
international users; cross-discipline keywords lacking entirely e.g.
Football in the US does not mean the same as Football in Europe.
Solution: Adopt an internal policy to adhere to an accepted standard at the core of
subject description, and then expand the description using keywords in the
abstract/summary copy.
24. Pitfalls to Avoid (continued)
Format and versions:
Result: Confusion within sales and distribution channels
Example 1: Users fail to find a compatible format for the title they
want
Solution: Apply ISBNs correctly – unique identifier for each e-edition
Example 2: Citations are incorrect or inconsistent
Solution: Apply version-specific pagination if appropriate
Example 3: Links to content fail over time
Solution: Apply DOIs to establish a persistent and reliable link
Example 4: Data is not fully utilized/indexed by discovery systems
Solution: Output information in industry standard formats (ONIX)
25. Pitfalls
Lack of high quality information reduces the likelihood of
content to be discovered.
26. References
The Ins and Outs of Evaluating Web-Scale Discovery Services by Athena Hoeppner
http://www.infotoday.com/cilmag/apr12/Hoeppner-Web-Scale-Discovery-Services.shtml
Stakeholders Strive to Define Standards for Web-Scale Discovery Systems By Michael Kelley on October 11, 2012
http://www.thedigitalshift.com/2012/10/discovery/coming-into-focus-web-scale-discovery-services-face-growing-need-for-best-practices
White Paper: The Link Between Metadata and Sales By Andre Breedt, Head of Publisher Account Management; David
Walter, Research and Development Analyst, 2012
http://www.isbn.nielsenbook.co.uk/uploads/3971_Nielsen_Metadata_white_paper_A4(3).pdf
The BIC Basic standards for bibliographic data provision
http://www.bic.org.uk/17/BIC-Basic/
Web-Scale Discovery in an Academic Health Sciences Library: Development and Implementation of the
EBSCO Discovery Service DOI:10.1080/02763869.2013.749111JoLinda L. Thompsona*
, Kathe S. Obriga
& Laura E. Abatea
Medical Reference Services Quarterly Volume 32, Issue 1, 2013
http://www.tandfonline.com/doi/abs/10.1080/02763869.2013.749111
Discoverability Challenges and Collaboration Opportunities within the Scholarly Communications Ecosystem:
A SAGE White Paper Update by Mary M. Somerville, University of Colorado Denver;Lettie Y. Conrad, SAGE Collaborative
Librarianship Vol 5, No 1 (2013)
Affection for PDA By Steve Kolowich 2012 Inside Higher Ed
http://www.insidehighered.com/news/2012/06/20/research-foresees-demand-driven-book-acquisition-replacing-librarians-
discretion#ixzz2VWOAqWoU
Good afternoon, my name is Jay Henry and I’m with Ringgold – we are a data services company with offices in Portland and near Oxford UK. Our business has two main areas of focus- working with publishers to normalize (clean) and uniquely identify their internal data and on another side of the business we provide metadata creation and dissemination services. [may want to mention Book News] Today I will be speaking as an advocate for excellent metadata, and while I believe everything worth creating is worth thoroughly describing, for the purposes of this talk I will be focusing my comments on scholarly monographs and basic data elements common to all areas of publishing. The content of this presentation is meant to inform the initiated and educate those new to the concept of “Web Scale Discovery Services”, but my focus on metadata should apply to any publishing strategy regardless of the downstream target of your data. I will touch on how the emergence of this technology is enabling new types of acquisition models, highlight the challenges to publishers, and provide some practical information for you to consider when deciding how to approach metadata creation. Please understand, I will be speaking about only a small portion of the supply chain directly related to exposure and discovery of content. Specifically, the linkage between Publishers & their contributors, intermediaries, libraries and their patrons, and the effect and importance of WSDs in that context. Of course, the benefits of well-formed metadata are so profound as to provide a direct benefit to scholarship… I won’t go down the road of making a philosophical argument that publisher’s have a moral obligation to strive for the highest standards, but you can see how I’m thinking about this topic. Let’s just not forget that good quality metadata has a positive effect in many areas of the supply chain, and natural efficiencies are the result; that should be reason enough to attempt to stay awake at this awkward hour for consciousness. I will be making the case that the emergence of WSD in conjunction with new acquisition models represents a real change in the supply chain which requires attention from publishers to ensure they are doing everything possible to ensure their content will be in the best possible position to be discovered.
There are a lot of terms tossed around when we talk about search [read terms] – read next A quick clarification on definitions - Are we hearing different names for the same things? No. The term Web Scale or “discovery services” is being used throughout the publishing industry as the most recent darling buzzword…and for good reason. Web search utilities (Google, Bing, etc.) have transformed library patron and researcher behavior. “Search” is maturing as a concept and taking on new dimensions within libraries as they strive to compete with mainstream search services. Define WebScale as the next step in focused, de-cluttered, search capability that provides visibility to resources beyond the library, and puts more power in the hands of patrons to influence purchasing– not only through DDA, but by their behavior and the extent to which they interact with content (circ stats as a means of judging/vetting quality/utility often the same– and then making purchase/renewal decisions. About discovery – and systems like these… this is what exposes content, not catalogues or flyers or special promotional emails…what sells and circulates content is putting the right information in front of the right consumer and enabling access– users (especially librarian buyers) will spend vast amounts of their time with a handful of familiar tools—presenting the right data within those tools should be a top priority for every publisher.
The term “discovery services” is being used throughout the publishing industry as the most recent darling buzzword…and for good reason. Web search utilities (Google, Bing, etc.) have transformed library patron and researcher behavior. “Search” is maturing as a concept and taking on new dimensions within libraries as they strive to compete with mainstream search services.
More of a “game changer” I believe WSD services represent a truly mature search technology for libraries that will provide benefits to users and the libraries themselves by allowing non-owned resources to be part of the central index. DDA Emerging as an important new way to present title information to patrons This model delivers what patrons want – and users have driven adoption of change more than any other factor The proliferation of WSD goes beyond the main players that I mentioned earlier; some system vendors (of current ILS installations within libraries) have begun to integrate WSD services by partnership and technology integration.
Re: web search – the ability to search across the web changed user behavior and their expectations– federated search has been trying to delivery a similar experience to users, but only now is there the potential to delivery a vastly improved, yet focused, search for academic research. Non-linear lending– might want to mention ProQuest/EBL/Ebrary as innovators in experimenting with new acquisition models;
Complexity can be managed by systems—in fact, whenever a need arises, a solution appears; however, the best solutions can not work with poor quality data—the old cliché of ‘garbage in garbage out’ still applies. There is more content to describe than ever, and as a result, unique identifiers are the best way to disambiguate and link your data to relevant sources. metadata has been cooped up for a while, and is not feeling it’s old strength I’m here to talk about the importance of good quality metadata (and what is meant by “good quality”) in the context of web scale discovery systems not because the term is the flavor of the month, but because they matter—this is an important trend that I believe will become the standard model not only within academic institutions, but everywhere. ---COUNTER DATA---???? ---Some publishers are better than others– there is a range, and those doing the best job tend to be the largest and most well recognized brands which increases their ability to ensure their content is discovered; more than ever, descriptive data is a competitive factor
WSDS – the importance of complete metadata in order to support systems no one really understands The only solution is to provide as much data as possible in order to provide the broadest description possible to provide the algorithms at work the raw material that will ultimately produce hits and increased visibility
Publishers must drive the creation and initial proliferation of complete and high quality metadata: Reference -Nielsen study Publishers are the first, and should be the best, source of metadata for a title. Still, much of what can and will be added as part of a ‘description’ of a work will be created after the thing is actually published, and so, metadata grows within the supply chain over time; those records that have a strong start will be the most utilized and afford the greatest benefit to the publisher In my introduction I used the term, ‘Publisher as Parent”… one thing a good parent provides for it’s newly created work is a unique name; in the case of monographs it is possible and advantageous to uniquely identify not only the work and it’s various manifestations, but also content within the work; deep linking content and expanding the descriptive data associated with each discrete chunk (e.g. chapters) provides an excellent start to a young work’s descriptive foundation. Ultimately, publishers benefit from looking at meta-metadata…metrics that allow publishers to evaluate their publishing strategies and focus on areas where they experience greater success or can see trends in user behavior. Just as important, content will increasingly be judged based on usage– the same data that exposes titles for purchase drives ongoing circulation and renewals.
I’ve listed the bare minimum here; the BIC bibliographic standard is a good list of what should be supplied, but of course, more is better, always always always. The important thing to remember about creating good quality metadata is to adhere to standards and uniquely identify everything possible.
Let’s get specific about what kind of metadata is worthy of adjectives like, ‘better’, or ‘complete’– Unique Identifiers allow content to be disambiguation, internal, external, etc… standards grease the wheels of the supply chain
In my introduction I used the term, ‘Publisher as Parent”… one thing a good parent provides for it’s newly created work is a unique name; in the case of monographs it is possible and advantageous to uniquely identify not only the work and it’s various manifestations, but also content within the work; deep linking content and expanding the descriptive data associated with each discrete chunk (e.g. chapters) provides an excellent start to a young work’s descriptive foundation. If we take a few of the participants, apply standard identifiers and adopt a data distribution policy that spreads and enhances the initial record, and things begin to change.
Highlight slide
Highlight slide
Highlight slide – metadata combined with standard identifiers CHANGE the supply chain...merging at an ever increasing rate and the flow of information across systems will be key to exposing content and realizing sales and use of works.
After everything I’ve said to this point, this slide is really a summary of what I’ve already advocated – more is better, wide distribution, standards, unique identification and a policy to create consistent descriptive output… from that strategic foundation, more can be done, but this is the minimum. What do I mean by complete? –deep, chapters, summaries of chapters, links to chapters, images, etc. Efficient distribution means pointing data in directions which “trickle out” and which leads to further enrichment of the description, including the addition of user generated content/reviews etc. Powerful new tools are now widely available to create clear metrics that provide the basis for better informed decisions by institutional purchasers.
Re: apply unique ids – “once uniquely identified, always uniquely identified”Each format through which you publish your book requires its own ISBNbecause this thirteen-digit numeral unmistakably identifies the title, edition, binding, and publisher of a given work. So your paper book will have its own ISBN, the audiobook will have its own ISBN, and the ebookits own ISBN. Re: Evaluate: Many publishers have the resources to do a good job and are doing so, others simply don’t have the resources to put a complete plan together and execute—nonetheless creating the best possible data for your content is critical regardless of how it’s accomplished.
“ Once uniquely identified, always uniquely identified”… by definition