The Future of Metadata Management & Making Library Collections Discoverable on the Web
1. The Future of Metadata
Management &
Making Library Collections
Discoverable on the Web
Ted Fons, OCLC
The National Library Descriptors Conference
Proposed Changes in the Development of Library Collections in the Era of the Semantic Web
Warsaw - 21-22 April, 2015
4. The Goal
S.R. Ranganathan
1. Books are for use.
2. Every reader his [or her] book.
3. Every book its reader.
4. Save the time of the reader.
5. The library is a growing
organism.
Image credit: http://static.guim.co.uk/sys-
images/Guardian/Pix/pictures/2009/3/23/1237806064989/Young-man-
Connect the reader to content.
6. How We Work Today
Local
Group
Global
We catalog:
• Books
• Music
• Journal titles
• Authorities
7. What is in a Global Discovery System
Readers want:
• eBooks
• Articles
• Unique content
We catalog:
• Books
• Music
• Journal titles
• Authorities
8. • Calhoun: “Metadata has changed as collections have
changed. It remains important, but it comes in many
forms and from many sources. The centrality of
bibliographic control has been disrupted.” P. 15.
• And: “There is less need and place for traditional
bibliographic control as a set of methods for providing
[metadata] for discovery, access and management of
the content of mainstream books and serials. “p. 24.
Catalogue 2.0 by Karen Calhoun
9. “Ken Chad examines the distinction betweeen
redundant cataloging (re-editing records to suit local
practices) and redundant catalogs [in the UK]. He
enumerates the benefits of moving from … 160
standalone catalogues to a single shared catalogue at
the network level for all of these libraries”
Karen Calhoun in Catalogue 2.0
Duplicating records for local purposes
10. The world’s libraries. Connected.
What is in a Global Discovery System
Readers want:
• eBooks
• Articles
• Unique content
We catalog:
• Books
• Music
• Journal titles
• Authorities
11. The world’s libraries. Connected.
The value of authorities FRAD Tasks
Find
Identify
Clarify
Contextualize
http://www.ifla.org/publications/functional-requirements-for-authority-data
12. What is in a Global Discovery System
So, what should we do?
1. Catalog unique materials
2. Create authorities
3. Use harvesting and data mining for everything else
19. The Web of …
Documents
Active Documents
Discovery
Data
Knowledge
☌☌☌
Libraries can
connect to the web
of knowledge
20. The Knowledge Graph
☌
Libraries can
connect to the web
of knowledge
Libraries can create
a knowledge graph
Documents
Entities
21.
22.
23. Establishing Semantic Identity
For Accurate Representation
on the Web
12/09/2014
Kenning Arlitsch
Dean of the Library
Kenning Arlitsch, Dean of the Library
Patrick OBrien, Semantic Web Research Director
24. The Point
Libraries are poorly defined and represented on
the Semantic Web…
…but we know how to fix that problem…
…mostly
30. Summary
• Define library organization in Wikipedia
– Beware of *pedia culture and process
• Engage with other trusted data sources
– FreeBase
– Google Places/Google My Business
– Google+
• Mark-up metadata with Schema.org
31. The Knowledge Graph
☌
Libraries can
connect to the web
of knowledge
Libraries can create
a knowledge graph
Documents
Entities
33. person place
object concept
organization work
Thelibraryknowledgegraph
http://www.ifla.org/publications/functional-requirements-for-bibliographic-records
FRBR Entities
Work
Expression
Manifestation
Item
34. Exampleofbenefits…
Discovery
The Name of the Rose
Summary: The year is 1327. Franciscans in a wealthy
Italian abbey are suspected of heresy, and Brother
William of Baskerville arrives to investigate. His
delicate mission is suddenly overshadowed by seven
bizarre deaths that take place in seven days and
nights of apocalyptic terror.
Subjects
Borrowing Options
eBooks | Printed Books | Audio Books
Other Languages
Monastic libraries -- Italy – Fiction | Semiotics -- Fiction
35. Example of Benefits: Web Exposure
data.BnF.fr
Number of Visits
-
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
7,000,000
8,000,000
January February March April May June July August September October
Visits to WorldCat
2012 2013 2014
37. The Data Strategy: WorldCat Entities
Work and Person Creation Process Flow
Extractors
Enhanced
WC
Records
Harvested
Triples
Refined
Triples
CreateWorkReducer
1. Harvest
3. Reduce
There are three components to the pipeline for creating
Work and Person entities. The harvest component
extracts the data from the different sources. The map
component identifies the objects and combines the triples
through name recognition and authority linkages. The
reduce component pulls together the entity descriptions
and writes them out to HBase.
VIAF
LCNAF
DBPedi
a
CreatePersonReduc
2. Map
ObjectMappe
r
PersonCombi
ne
WorkCombin
e
Datamining
38. • 197+ million Work descriptions and URIs
• Schema.org + BiblioGraph.net
• RDF Data formats
• RDF/XML, Turtle, Triples, JSON-LD
• Links to WorldCat manifestations
• Links to Dewey, LCSH, LCNAF, VIAF, FAST
• Open Data license via Linked Data Explorer
• 2015: Discovery API, Metadata API
• Released April 2014
http://www.oclc.org/dataThe Work Entity
39. • 98+ million Person descriptions and URIs
• Person entities with authority: 20.2 million
• Person entities without authority: 78.3 million
• Schema.org + BiblioGraph.net
• Harvested from WorldCat data and enriched from other hubs
RDF Data formats
• RDF/XML, Turtle, Triples, JSON-LD
• Links to WorldCat Works. Added links from WC Works.
• Open Data license via Linked Data Explorer
• 2015: Linked Data Explorer, Discovery API
http://www.oclc.org/dataThe Person Entity
The Web has and continues to evolve:
Linked Documents – documents built on the fly from databases – search engines analyzing the links to create discovery – sites starting to publish the [linked] data behind the documents.
How have libraries engaged with the web:
Enthusiastic & leading for documents – actively disengaged with the search engines (technology issues and commercial concerns) – partial engagement with the web of data.
A Web of knowledge is forming as the search engines analyze the relationships in the data – how will libraries participate?
The Web has and continues to evolve:
Linked Documents – documents built on the fly from databases – search engines analyzing the links to create discovery – sites starting to publish the [linked] data behind the documents.
How have libraries engaged with the web:
Enthusiastic & leading for documents – actively disengaged with the search engines (technology issues and commercial concerns) – partial engagement with the web of data.
A Web of knowledge is forming as the search engines analyze the relationships in the data – how will libraries participate?
Google's knowledge Graph navigating between entity descriptions….
0 - Search for MSU in 2012
1 - On the left traditional Organic search results
2 – notice poor description of the MSU Library
3 – Google’s Knowledge Card of the “Thing” they believe to be Montana State University Library
4 – However, it’s the wrong phone number, wrong city, wrong address, wrong map
0 – After doing research in this area we have concluded a Library must establish and maintain its semantic identity on the Web. This is the same search for Montana state university library in 2014. Lets walk though the changes on this slide and then talk about how they came about in the rest of the presentation
1 – Improved description of the library
2 - Correct address, phone number and a Google map link
3 - more links to key areas of our web site as determined by Google’s algorithms
4 - link to more results from Montana.edu
5 - link to our G+ page w/ a picture of our building and the number of followers
6 – the correct MSU Library Logo
7 – link to a robust Wikipedia description of the MSU library
The Web has and continues to evolve:
Linked Documents – documents built on the fly from databases – search engines analyzing the links to create discovery – sites starting to publish the [linked] data behind the documents.
How have libraries engaged with the web:
Enthusiastic & leading for documents – actively disengaged with the search engines (technology issues and commercial concerns) – partial engagement with the web of data.
A Web of knowledge is forming as the search engines analyze the relationships in the data – how will libraries participate?
We, at OCLC, with our major data ingest and processing techniques – Big Data tech
Matching incoming data with what we have
Identifying the entities and associating their role attributes
Works – not so far very visible in libraries – important on the web
Building a graph of relationships
Data to underpin innovation! - A person knowledge card in a prototype WorldCat Discovery interface
Refined from 320M harvested entities.
f there is a 100 or 700 field for a Person entity, then there will be a BY relationship (creator, contributor, author, illustrator, etc) in the WC Work description that includes a WC Person URI.
If there is a 600 field for a Person entity, then there will be an ABOUT relationship (subject, etc) in the WC Work description that includes a WC Person URI.
Other sources:
After creating the set of Person entities, we started the process of enriching the entities with data harvested from other sources - images and other information from DBPedia, preferred names from LC, see also links from VIAF, and profile information (subjects, genres, and roles most known for) from WC Identities.