In this presentation we summarize some lessons learned from trial audits of several production distributed digital preservation networks. These audits were conducted using the open source SafeArchive system (www.safearchive.org) . This presentation shows the importance of designing auditing systems to provide diagnostic information that can be used to diagnose non-confirmations of audited policies.
1. Prepared for
PLN 2012
UNC, Chapel Hill
October 2012
Auditing PLN’s:
Preliminary Results and Next Steps
Micah Altman,
Director of Research, MIT Libraries
Non Resident Senior Fellow, The Brookings Institution
Jonathan Crabtree,
Assistant Director of Computing and Archival Research
HW Odum Institute for Research in Social Science, UNC
2. Collaborators*
• Nancy McGovern
• Tom Lipkis & the LOCKSS Team
Research Support
Thanks to the Library of Congress, the National Science
Foundation, IMLS, the Sloan Foundation, the Harvard
University Library, the Institute for Quantitative Social
Science, and the Massachusetts Institute of Technology.
* And co-conspirators
Auditing PLN's
3. Related Work
Reprints available from: micahaltman.com
• M. Altman, J. Crabtree, “Using the SafeArchive System: TRAC-
Based Auditing of LOCKSS”, Proceedings of Archiving 2011, Society
for Imaging Science and Technology.
• Altman, M., Beecher, B., & Crabtree, J. (2009). A Prototype Platform
for Policy-Based Archival Replication. Against the Grain, 21(2), 44-
47.
Auditing PLN's
4. Preview
• Why audit?
• Theory & Practice
– Round 0: Setting up the Data-PASS PLN
– Round 1: Self-Audit
– Round 2: Compliance (almost)
– Round 3: Auditing Other Networks
• What’s next?
Auditing PLN's
6. Short Answer: Why the heck not?
“Don‟t believe in anything you hear,
and only half of what you see”
- Lou Reed
“Trust, but verify.”
- Ronald Reagan
Auditing PLN's
7. Slightly Long Answer:
Things Go Wrong
Physical & Hardware Software
Insider &
External
Attacks
Organizational
Failure
Media Curatorial Error
9. OAIS Model Responsibilities
• Accept appropriate information from Information
Producers.
• Obtain sufficient control of the information to ensure long
term preservation.
• Determine which groups should become the Designated
Community able to understand the information.
• Ensure that the preserved information is independently
understandable to the DC
• Ensure that the information can be preserved against all
reasonable contingencies,
• Ensure that the information can be disseminated as
authenticated copies of the original or as traceable back to
the original
• Makes the preserved data available to the DC
Auditing PLN's
10. OAIS Basic Implied Trust Model
• Organization is axiomatically trusted to identify
designated communities
• Organization is engineered with the goal of:
– Collecting appropriate authentic document
– Reliably deliver authentic documents, in understandable
form, at a future time
• Success depends upon:
– Reliability of storage systems:
e.g., LOCKSS network, Amazon Glacier
– Reliability of organizations:
MetaArchive, DataPASS, Digital Preservation Network
– Document contents and properties:
Formats, Metadata, Semantics, Provenance, Authenticity
Auditing PLN's
11. Reflections on OAIS Trust Model
• Specific bundle of trusted properties
• Not complete instrumentally nor ultimately
Auditing PLN's
13. Audit [aw-dit]:
An independent evaluation of
records and activities to
assess a system of controls
Fixity mitigates risk only if used
for auditing.
14. Functions of Storage Auditing
• Detect
corruption/deletion of content
• Verify
compliance with storage/replication policies
• Prompt
repair actions
15. Bit-Level Audit Design Choices
• Audit regularity and coverage:
on-demand (manually); on object access; on
event; randomized sample;
scheduled/comprehensive
• Fixity check & comparison algorithms
• Auditing scope:
integrity of object; integrity of collection;
integrity of network; policy compliance;
public/transparent auditing
• Trust model
• Threat model
16. Repair
Auditing mitigates risk only if
used for repair.
Key Design Elements
• Repair granularity
• Repair trust model
• Repair latency:
– Detection to start of repair
– Repair duration
• Repair algorithm
17. LOCKSS Auditing & Repair
Decentralized, peer-2-peer, tamper-resistant
replication & repair
Regularity Scheduled
Algorithms Bespoke, peer-reviewed, tamper resistant
Scope - Collection integrity
- Collection repair
Trust model - Publisher is canonical source of content
- Changed contented treated as new
- Replication peers are untrusted
Main threat models - Media failure
- Physical Failure
- Curatorial Error
- External Attack
- Insider threats
- Organizational failure
Key auditing limitations - Correlated Software Failure
- Lack of Policy Auditing, public/transparent auditing
18. Auditing & Repair
TRAC-Aligned policy auditing as a overlay network
Regularity Scheduled; Manual
Fixity algorithms Relies on underlying replication system
Scope - Collection integrity
- Network integrity
- Network repair
- High-level (e.g. trac) policy auditing
Trust model - External auditor, with permissions to collect meta-
data/log information from replication network
- Replication network is untrusted
Main threat models - Software failure
- Policy implementation failure
(curatorial error; insider threat)
- Organizational failure
- Media/physical failure through underlying replication
system
Key auditing limitations Relies on underlying replication system, (now) LOCKSS, for
fixity check and repair
19. Theory vs. Practice
Round 0: Setting up the Data-PASS PLN
“Looks ok to me”
- PHB Motto
Auditing PLN's
20. Theory
Expose Content ( Install LOCKSS
Through (On 7 servers)
OAI+DDI+HTTP )
Harvest Content
(through OAI plugin)
Setup PLN
configurations
(through OAI plugin)
LOCKSS
Magic
Done
Auditing PLN's
21. Practice (Year 1) Theory
• OAI Plugin extensions required:
– Non-DC metadata
– Large metadata Expose Content ( Install LOCKSS
Through
– Alternate authentication method OAI+DDI+HTTP )
(On 7 servers)
– Save metadata record
– Support for OAI-SETS Harvest Content
– Non-fatal error handling (through OAI
plugin)
• OAI Provider required:
– Authentication extensions Setup PLN
configurations
– Performance handling for delivery (through OAI
plugin)
– Performance handling for errors LOCKSS
– Metadata validation Magic
• PLN Configuration required:
– Stabilization around LOCKSS versions
– Coordination around plugin Done
repository
– Coordination around AU definition
22. Theory vs. Practice
Round 1: Self-Audit
“A mere matter of implementation”
- PHB Motto
Auditing PLN's
23. Theory
Gather Information
from Add Replica
Each Replica
Integrate
Information ->
Map Network State
State
NO
Compare Current ==
Network to Policy Policy
?
YES
Success
Auditing PLN's
25. Practice (Year 2) Theory
• Gathering information required
– Permissions Gather Information
– Reverse-engineering UI’s (with help) from Add Replica
Each Replica
– Network magic
• Integrating information required
– Heuristics for lagged information Integrate
Information ->
– Heuristics for incomplete Map Network State
information
– Heuristics for aggregated State NO
information Compare Current ==
State Map to Policy Policy
• Comparing map to policy required ?
Mere matter of implementation
YES
• Adding replica:
Uh-oh, most policies failed
Adding replicas wasn’t going to resolve most Succes
issues s
26. Theory vs. Practice
Round 2: Compliance (almost)
“How do you spell „backup‟?
R–E-C–O–V–E–R-Y
-
Auditing PLN's
27. Practice (and adjustment) makes
perfekt?
• Timings (e.g. crawls, polls)
– Understand
– Tune
– Parameterize heuristics, reporting
– Track trends over time
• Collections
– Change partitioning to AU’s at source
– Extend mapping to AU’s in plugin
– Extend reporting/policy framework to group AU’s
• Diagnostics
– When things go wrong – information to inform adjustment
Auditing PLN's
28. Theory vs. Practice
Round 3: Auditing Other PLNs
“In theory, theory and practice are the same –
in practice, they differ.”
-
Auditing PLN's
29. Theory
Gather Information
Add
from
Replica
Each Replica NO
YES
Adjust AU Sizes,
Integrate Polling
Information -> Intervals
Map Network State adjusted?
State
NO
Compare Current ==
Network to Policy Policy
? YES
Success
Auditing PLN's
30. Practice (Year 3) Theory
• 100% of what?
• Diagnostic inference
Gather
Add
Information from
Replica
Each Replica NO
YES
AU Sizes,
Integrate Adjust Polling
Information -> Intervals
Map Network adjusted
State ?
State
Compare Current == NO
Network to Policy Policy
? YES
Succe
ss
31. 100% of what?
• No: Of LOCKSS boxes?
• No: Of AU’s?
• Almost: Of policy overall
• Yes: Of policy for specific collection
• Maybe: Of files?
• Maybe: Of bits in a file?
32. What you see Box X,Y,Z all agree on
AU A
What you can conclude:
Assumption:
Box X,Y,Z have the Failures on file harvest are
independent; number of Content is good
same content
harvested files large
Auditing PLN's
33. What you see Box X,Y,Z don’t agree
What you can conclude?
Auditing PLN's
34. Hypothesis 1: Disagreement is real, but doesn’t really matter.
Non-Substantive AU differences (arising from dynamic elements in AU’s that have no bearing on the substantive content )
1.1 Individual URLS/files that are dynamic and non substantive (e.g., logo images, plugins, Twitter feeds, etc.) cause
content changes (this is common in the GLN).
1.2 dynamic content embedded in substantive content (e.g. a customized per-client header page embedded in the pdf
for a journal article )
Hypothesis 2: Disagreement is real, but doesn’t really matter in the longer run (even if disagreement persists over long run!)
2.1 Temporary AU Differences. Versions of objects temporarily out or sync.
(E.g. if harvest frequency << source update frequency, but harvest times across boxes vary significantly)
2.2 Objects temporarily missing
(E.g. recently added objects are picked up by some replicas, not by others)
Hypothesis 3: Disagreement is real, matters
Substantive AU differences
3.1 Content corruption (e.g. from corruption in storage, or during transmission/harvesting)
3.2 Objects persistently missing from some replicas
( e.g. because of permissions issue @ provider; technical failures during harvest; plugin problems)
3.2 Versions of objects persistently missing/out of sync from some replicas
(e.g. harvest frequency > source update frequency leading to different AU’s harvesting different versions of the content. )
Note that later “agreement” signifies that a particular version was verified, not that all versions have been replicated
and verified
Hypothesis 4: AU’s really do agree, but we think they don’t
4.1 Appearance of disagreement caused by Incomplete diagnostic information Poll data are missing as a result of
system reboot, daemon updates, or other cause.
4.2 Poll data are lagging – from different periods Polls fail, but contains information about agreement that is ignored
36. Design Challenge
• Create more sophisticated algorithms
and
• Instrument PLN data collection
Such that
Observed behavior allows us to distinguish
between hypotheses 1-4.
Auditing PLN's
38. What’s Next?
“It‟s tough to make predictions,
especially about the future”
-Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston
Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert
Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan
Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey
Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, and others
Auditing PLN's
39. Short Term
• Complete round 3 data collection
• Refinements of current auditing algorithms
– More tunable parameters (yeah?!)
– Better documentation
– Simple health metrics
• Reports, and dissemination
Auditing PLN's
40. Longer Term
• Health metrics, diagnostics, decision support
• Additional audit standards
• Support additional replication networks
• Audit other policy sets
Auditing PLN's
41. Bibliography (Selected)
• B. Schneier, 2012. Liars and Outliers, John Wiley & Sons
• H.M. Gladney, J.L. Bennett, 2003. “What do we mean by authentic”,
D-Lib 9(7/8)
• K. Thompson, 1984. “Reflections on Trusting Trust”, Communication
of the ACM, Vol. 27, No. 8, August 1984, pp. 761-763.
• David S.H. Rosenthal, Thomas S. Robertson, Tom Lipkis, Vicky
Reich, Seth Morabito. “Requirements for Digital Preservation
Systems: A Bottom-Up Approach”, D-Lib Magazine, vol. 11, no. 11,
November 2005.
• OAIS, Reference Model for an Open Archival Information System
(OAIS). CCSDS 650.0-B-1, Blue Book, January 2002
Auditing PLN's
This work by Micah Altman (http://micahaltman.com) , with the exception of images explicitly accompanied by a separate “source” reference, is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.