Presentation at ALA Midwinter Dallas at the Cataloging Norms IG. Describes the differences between management at the record level and at the statement level.
2. What’s different
about statement
data?
Library data compliance
has been defined by
consensus since MARC
was a pup
But outside the MARC
silo we need different
strategies
To accomplish this we
need to look at value,
costs and investments
very differently
2
Flickr photo by Robert Jagendorf ALA Dallas, 1/20/12
3. What Are Statements?
• A MARC record can be viewed as an aggregation
of statements
• All the attribute = value pairs relate to the same
resource
• In a linked data world, statements are dis-
aggregated and each carries the relationship to a
resource as the ‘subject’ of each triple
• Though it seems more complicated to deal with
statements in isolation, it is really simpler (the
complications are that we know little about it)
3 ALA Dallas, 1/20/12
4. Future Metadata
Strategies
• Statement level rather than record level management
• Records as units of transport rather than units of
management
• Emphasis on evaluation coming in and provenance
going out
• Shift in human effort from creating standard cataloging
to careful human intervention in machine-based
processes
• Extensive use of data created outside libraries
• Intelligent re-use of our legacy data
4 ALA Dallas, 1/20/12
5. Managing Statements
http://dcpapers.dublincore.org/ojs/pubs/article/view/770/766
5 ALA Dallas, 1/20/12
6. [Possible] New Roles for
Librarians
• Aggregators of relevant metadata content
• Developing methods to expose & redistribute without a
central node
• Modeling and documenting best practices in metadata
creation, improvement and exposure
• Application profiles important in this effort
• Developers of vocabularies using bibliographic
relationships
• Innovators in using social networks to enhance
bibliographic description
6 ALA Dallas, 1/20/12
9. Harvest/Ingest Plan
• Choosing data sources
• There are known sources out there, some of them
are of good quality, others are usable, with
improvement
• Tools are needed to help pull data, validate it,
cache it, and set it up for evaluation
• Most of these tasks can/should be set up with
automated processes, with alerts to human minders
when something goes wrong
9 ALA Dallas, 1/20/12
11. Metadata Evaluation
• Evaluation needs to scale well beyond random
sampling
• Statistical and data mining tools need to be
brought into the process, to provide both
‘overview’ and specifics of whole data sets
• Improvement specifications, techniques, quality
criteria and tools need to be iterative, granular,
and shareable
11 ALA Dallas, 1/20/12
13. Testing, Monitoring & Re-
evaluation
• Data will change, and processes must be able to
detect that, based on data profiles
• Human intervention should be limited
• Tools need to be built so that non-programmers
can run them
• Reading logs, monitoring error reports, checking
results, writing specs, can/should be done by data
specialists (a.k.a. catalogers w/training)
• Looking for opportunities for programmers and
catalogers to learn together is essential
13 ALA Dallas, 1/20/12
15. Re-distribution Plan
• If we improve data, we need to expose how we
did it (and what we did), for the use of
downstream consumers
• New metadata provenance efforts designed to do
this at the statement level
• This strategy can only exist successfully where
open licenses allow innovation and wide re-use
• Ideally, distribution AND redistribution should be
accomplished with Application Profiles
15 ALA Dallas, 1/20/12
16. Will This Shift Cost Too
Much?
• It’s the human effort that costs us
• Cost of traditional cataloging is far too high, for
increasingly dubious value
• Our current investments have reached the end of their
usefulness
• All the possible efficiencies for traditional cataloging have
already been accomplished
• Waiting for leadership from the big players costs us
valuable time with no guarantees of results
• We need to figure out how to invest in more distributed
innovation and focused collaboration
16 ALA Dallas, 1/20/12
17. ROI in the LOD World
• Free metadata is essential in a ‘culture economy’
• We need eyeballs, attention, connection for our
content!
• Thinking about ROI based on recovering the cost
of creating metadata is a dead end
• To drive people to your content, you need to put
your data out there
• But once it’s there, it’s out of your control, and we
need to get comfortable with that
17 ALA Dallas, 1/20/12