1. Challenges for the
New Era
Diane I. Hillmann
Metadata Management Associates
Oslo, February 8, 2013
2. Big Challenges/Big Ideas
O Changing our thinking from records to
statements
O Will RDA help?
O Where you start affects where you end up
O Shifting our ways from ROI to Potlach
O Recognizing that our human resources
are limited
O So how do we manage this data-that-isn‟t
records?
Oslo 2013 2/7/13 2
3. Statements and Records
O Records are still important but not as
we‟ve used them in the past
O We might want to think about records as
the instantiation of a point of view
O News: traditional library data has a point of
view
O MARC required consensus because of
limitations built into the technology
O Now we need provenance, so we know
“Who said?”
Oslo 2013 2/7/13 3
4. Building RDVocab: Goals
O Bridge the XML and RDF worlds
O Ensure ability to map between RDA and
other element sets
O Provide a sound platform for extension of
RDA Vocabularies into new and
specialized domains
O Consider methods for expressing AACR2
structures in technical ways to ease the
pain of transition to RDA
Oslo 2013 2/7/13 4
5. RDVocab Structure, Simplified
O RDA Properties declared in two separate
hierarchies:
O An „unconstrained‟ vocabulary, with no explicit
relationship to FRBR entities
O A subset of classes, properties and
subproperties with FRBR entities as „domains‟
O Pros: retained usability in or out of
libraries; better mapping to/from non-
FRBR vocabularies
O Cons: still seems too complex to many
SemWeb implementers (many using
BIBO)
Oslo 2013 2/7/13 5
6. Why Unconstrained
Properties?
O The „bounded‟ properties should be seen as
the official JSC-defined RDA Application
Profile for libraries
O What‟s still lacking is the addition of the necessary
constraints: datatypes, cardinality, associated
value vocabularies
O Extensions and mapping should be built from
the unconstrained properties
O Unconstrained vocabularies necessary for use in
domains where FRBR not assumed or
inappropriate
O Mapping from vocabularies not using the FRBR
model directly to ones that do (and back) creates
serious problems for the „Web of Data‟
Oslo 2013 2/7/13 6
7. Property (Generalized, no FRBR
Semantic
relationship)
Web
Subproperty (with relationship to one FRBR
entity)
FRBR Entity
The Simple Case:
Library Applications One Property--
One FRBR Entity
Oslo 2013 2/7/13 7
8. Property (Generalized, no FRBR
relationship)
Semantic
Web
Subproperty (with relationship to one
FRBR entity)
FRBR Entity
Subproperty (with relationship to one
FRBR entity)
FRBR Entity
Library Applications The Not-So-Simple Case:
One Property—more than
Oslo 2013 One FRBR Entity 8
2/7/13
9. Roles: Attributes or
Properties?
O In 2005, the DC Usage Board worked with LC
to build a formal representation of the MARC
Relators so that these terms could be used
with DC
O This work provided a template for the
registration of the role terms in RDA (in
Appendix I) and, by extension, the other
RDA relationships
O Role and relationship properties are
registered at the same level as elements,
rather than as attributes (as MARC does with
relators, and RDA does in its XML schemas)
Oslo 2013 2/7/13 9
10. Vocabulary Extension
O The inclusion of unconstrained properties
provides a path for extension of RDA into
specialized library communities and non-
library communities
O They may have a different notion of how
FRBR „aggregates‟ (For example, a
colorized version of a film may be viewed
as a separate work)
O They may not wish to use FRBR at all
O They may have additional, domain-specific
properties to add, that could benefit from a
relationship to the RDA properties
Oslo 2013 2/7/13 10
11. RDA:adaptedAs
RDA:adaptedAsARadioScript
Oslo 2013 2/7/13 11
12. RDA:adaptedAs
RDA:adaptedAsARadioScri
pt
KidLit:adaptedAsAPictureBo
ok
Extension using Unconstrained Properties
Oslo 2013 2/7/13 12
13. RDA:adapted
As
RDA:adaptedAsARadioScr
ipt
KidLit:adaptedAsAPictureBo
ok
KidLit:adaptedAsAChapterBo
ok
Extension using Unconstrained Properties
Oslo 2013 2/7/13 13
14. Where you start affects where you
end up
O Simple metadata is more useful as output
than input
O The „long tail‟ of MARC‟s lesser used
properties was built up over decades and
shouldn‟t be discarded
O Easier to dumb down than smarten up
O Dublin Core and MARC examples of
starting simple and trying to add on
O Distribution models are important
Oslo 2013 2/7/13 14
15. Values vs. Costs
O Machines cost less than people, but they
can‟t replace people
O Computers tend to require instructions
from people to work well
O But they are more consistent than people!
O ROI culture vs. Potlatch culture
O Is „who pays for this?‟ the right question?
Oslo 2013 2/7/13 15
16. The Management Conundrum
O Traditional ILS‟s haven‟t worked for us for
a long time
O They were built to create and manage
catalog data
O We can no longer invest in the catalog
paradigm
O Libraries are data builders, data
managers, data distributors
O The centralized, master record model is as
dead as MARC encoding
Oslo 2013 2/7/13 16
17.
18. Linked Data is Inherently
Chaotic
O Requires creating and aggregating data in
a broader context
O There is no one „correct‟ record to be made
from this, no objective „truth‟
O This approach is different from the
cataloging tradition
O BUT, the focus on vocabularies is familiar
O In the SemWeb world vocabularies are
more complex than the thesauri we know
Oslo 2013 2/7/13 18
19. Model of ‘the World’ /XML
O XML assumes a 'closed' world (domain),
usually defined by a schema:
O "We know all of the data describing this
resource. The single description must
be a valid document according to our
schema. The data must be valid.”
O XML's document model provides
a neat equivalence to a
metadata 'record‟ (and most of
us are fairly comfortable with it)
Oslo 2013 2/7/13 19
20. Model of ‘the World’ /RDF
O RDF assumes an 'open' world:
O "There's an infinite amount of unknown
data describing this resource yet to be
discovered. It will come from an infinite
number of providers. There will be an
infinite number of descriptions. Those
descriptions must be consistent."
O RDF's statement-oriented data
model has no notion of 'record‟
(rather, statements can be
aggregated for a fuller description of
a resource)
Oslo 2013 2/7/13 20
21. The New Management
Strategy
O Statement level rather than record level
management
O Emphasis on evaluation coming in and
provenance going out
O Shift in human effort from creating standard
cataloging to knowledgeable human
intervention in machine-based processes
O Extensive use of data created outside libraries
O Intelligent re-use of our legacy data
Oslo 2013 2/7/13 21
22. Is MARC Dead?
O The communication format is very dead (based on
standards no longer updated)
O The semantics are not dead
O They represent the distillation of decades of
descriptive experience
O As we move into a more machine-assisted world,
our old concerns about the size of our legacy can
be addressed
O Taking the legacy records with us should be based
on solutions developed using open and transparent
strategies
23. What’s our Distribution
Model?
We know more about
We don’t know what what you want than you
you want, so choose! do. Here it is!
Oslo 2013 2/7/13 23
24. Libraries as Data Publishers & Consumers
O Data from library „publishers‟ should look like a
supermarket—lots of choices, with decisions made
by consumers
O Right now we seem to be operating as Soviet
bakeries
O This is not what open linked data is supposed to be
doing for us
O "Be conservative in what you send, liberal in what
you accept”—Robustness Principle
25. Our Goals as Data Publishers
O If we want people outside libraries to use our
data, we need offer them choices
O This strategy is based on mapping all of our
legacy data
O Not a selection
O Filtering accomplished by data consumers, who
know best what they need
O This requires active innovation and a new
understanding of how to manage the data
26. Our Goals as Data Consumers
O As aggregators of relevant metadata content
O Developing methods to gather and redistribute
without necessarily re-creating OCLC
O Modeling and documenting best practices in
metadata creation, improvement and
exposure
O Application profiles important in this effort
O As developers of vocabularies exposing a
variety of bibliographic relationships
O As innovators in using social networks to
enhance bibliographic description
Oslo 2013 2/7/13 26
27. Mapping Legacy Data for Re-distribution
O If we want data consumers to value our data, we
should map it all
O We can distribute limited „flavors‟ as well, as we
gain experience and feedback
O Current mapping strategies are based on
O One-time, inflexible, programmatic methods that
effectively hide the process from consumers
O Assumptions that data must be improved at the
time it is mapped, or never
30. If we don‟t distribute our best data,
how can anybody do cool stuff with it?
Isn‟t that what
we want?
We can use the cool stuff ourselves!
Oslo 2013 2/7/13 30
34. Harvest/Ingest Plan
O Choosing data sources
O There are known sources out there, some
of them are of good quality, others are
usable, with improvement
O Tools are needed to help pull data,
validate it, cache it, and set it up for
evaluation
O Most of these tasks can/should be set up
with automated processes, with alerts to
human minders when something goes
Oslo 2013 wrong 2/7/13 34
36. Metadata Evaluation
O Evaluation needs to scale well beyond
random sampling
O Statistical and data mining tools need to
be brought into the process, to provide
both „overview‟ and specifics of whole
data sets
O Improvement specifications, techniques,
quality criteria and tools need to be
iterative, granular, and shareable
Oslo 2013 2/7/13 36
38. Testing, Monitoring & Re-
evaluation
O Data will change, and processes must be able
to detect that, based on data profiles
O Human intervention should be limited
O Tools need to be built so that non-
programmers can run them
O Reading logs, monitoring error reports,
checking results, writing specs, can/should be
done by data specialists (a.k.a. catalogers
w/training)
O Looking for opportunities for programmers and
catalogers to learn together is essential!
Oslo 2013 2/7/13 38
40. Re-distribution Plan
O If we improve data, we need to expose
how we did it (and what we did), for the
use of downstream consumers
O New metadata provenance efforts are
designed to do this at the statement level
O This strategy can only exist successfully
where open licenses allow innovation and
wide re-use
O Ideally, distribution AND redistribution
should be accomplished with Application
Profiles
Oslo 2013 2/7/13 40
41. Will This Shift Cost Too
Much?
O It‟s the human effort that costs us
O Cost of traditional cataloging is far too high, for
increasingly dubious value
O Our current investments have reached the
end of their usefulness
O All the possible efficiencies for traditional
cataloging have already been accomplished
O Waiting for leadership from the big players
costs us valuable time with no guarantees of
results
O We need to figure out how to invest in more
distributed innovation and focused
collaboration
Oslo 2013 2/7/13 41
42. What About the Millions?
O Our legacy MARC data is already a
„graph‟, but the resources defined there
have no internet resolvable identity
O But even the transcribed text can be
hugely valuable, with effort and software
to help
O Projects like the eXtensible Catalog have
made an excellent start in demonstrating
this point
O MARC 21 is already available as basic
Oslo 2013 2/7/13 42
RDF
43. The Bottom Line
O Our big investment is (and has always
been) in our data, not our systems
O Over many changes in format of
materials, we‟ve always struggled to keep
our focus on the data content that
endures, regardless of presentation
format
O We are in a great position to have
influence on how the future develops, but
we can‟t be afraid to change, or afraid to
2/7/13
fail
Oslo 2013 43