TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
Gone today, here tomorrow: the future of government information and the digital FDLP
1. Gone today, here tomorrow:
the future of government
information and the digital
FDLP
James R. Jacobs
jrjacobs@stanford.edu
lockss-usdocs.stanford.edu
UW i-school
Thursday January 24, 2013
Wednesday, January 23, 2013
Iʼd like to thank Cass Hartnett, the Northwest Government Information Network, the UW Information School, the UW
Association of Library and Information Science Students (ALISS), and the University of Washington Libraries for inviting me to
talk with you today. I hope itʼll be worth your while :-)
2. Aaron Swartz 11.8.86 - 1.11.13
PD haiku notice: do what you feel like /
since the work is abandoned /
the law doesn’t care
http://www.aaronsw.com/weblog/000360
http://www.rememberaaronsw.com/
Wednesday, January 23, 2013
I dedicate today’s talk to my friend and internet activist Aaron Swartz for his progressive
ideals and dedication to free and open information access.
PD: do what you feel like / since the work is abandoned / the law doesn’t care
3. Agenda
• Historical ideals of the FDLP
• Collection strategies:
• Everyday Electronic Materials (EEMs) “Water droplets”
• Archive-it “Oceans”
• LOCKSS-USDOCS “Waterfalls”
• Collaboration “Reservoirs”
• Reflection
• [[slides available at slideshare.net/freegovinfo]]
Wednesday, January 23, 2013
introduce and agenda
Weʼre at the very beginning of the digital era where tools, policies, best practices, etc are all in flux. In many ways, weʼre at the
age of new metaphors needed to describe what it is that we as librarians do on a daily basis.
I'd like to talk about the underlying historical ideals of the FDLP, discuss how those ideals have been under fire from both within
and without the library community and argue that those ideals applied to today's new information metaphors give us the best
chance at access to and long-term preservation and assurance of govt information.
Then Iʼll talk about some of the digital collection strategies that Iʼve found to be successful and then conclude with a bit about
collaboration and to-dos.
4. Librarians ...
... Explore
... Collect
... Describe
... Share
... Preserve
Wednesday, January 23, 2013
... But basically, we explore, collect, describe, share and preserve the world of information. In my humble estimation, format
does not change what it is that we do as librarians! Today I aim to show that the shift to digital does not preclude us from
exploring, collecting, describing, sharing, and preserving government information.
Right up front, I'm a librarian and a collaborator in the LOCKSS-USDOCS distributed digital preservation project (Lots of
Copies Keep Stuff Safe). I've been in academia/education my whole life as a student, teacher, librarian and technologist. I've
worked in libraries since high school and been a government information/FDLP librarian since 2002 and served a 3-year term
on the Depository Library Council, the body which informs and advises the Govt Printing Office regarding issues of the Federal
Depository Library Program. So my mindset/perspective/bias is from one who assists in the scholarly communication process,
one who believes that libraries have a place in the digital information landscape, and one who believes strongly in the idea that
public access to govt information is a fundamental right.
In the print era (which is not over!) we had rules and processes in place to do the things that we do as librarians.
In the digital realm (which is just beginning and will continue to overlap with the print era for the foreseeable future) we are just
beginning to figure out the rules and processes. But the concepts remain the same.
Government documents are the DNA of the democratic process – Carl Malamud would call it the “source code.” And so we
must find ways to continue to give access and preserve this content for the long-term.
5. FDLP principles
• Forward democratic ideals
• Serve public interest / public access / public control / public
preservation
• Serve the information needs of your community
• Forward the long-term institutional viability of libraries
• Promote and leverage collective action
Wednesday, January 23, 2013
Are you:
--forwarding democratic ideals?
--serving public interest / public access / public control / public preservation?
--serving the information needs of your community?
--forwarding the long-term institutional life of libraries?
--promoting and leveraging collective action?
These are the principles that we as govt information librarians (and librarians in general!) hold dear. Best practices =
practices in which these principles are embedded – and the principles embedded in the FDLP.
If you too believe in these ideals (I hope!), then you already do take actions in support of these values – and probably one
of the main reasons you all have stayed in the field of librarianship is because you believe the following:
--libraries are critical as memory organizations
--local control of collections is imperative (e.g., a large network of libraries resists accident and natural disasters and are
self-healing. A large network of FDLP libraries can help alleviate and ameliorate the damage and rebuild collections when
those accidents invariably occur. Just ask my friend Rebecca Blakeley who had a wonderful presentation at the 2008 fall
Depository Library Conference about the steps that McNeese State University Library took in rebuilding their documents
collection after heavy damage suffered from Hurricane Rita.
--distributed system is crucial to meet local needs (spread responsibility for content among various locations and
administrations)
--public interest (affirms FDLP libraries’ role in ensuring permanent public access!)
--value of library community
--shared preservation responsibilities
While I talk about the following projects, please keep these principles and ideals in mind. So let’s get to the case study
part of the discussion.
6. “There seems to be an inverse relationship between
convenience of dissemination and preservation standards.”
-- Chuck Humphrey, data librarian, U of Alberta
Wednesday, January 23, 2013
Over the last 20-30 years, developments in publishing and Internet technologies have affected the way government
information is produced, disseminated, controlled, and preserved. These changes have affected the policies and procedures
of the GPO and, in turn, have affected the depository library program. Despite the often-heard promises that Web
technologies will bring more information to more people more quickly and easily, the actual effects have been decidedly
mixed. The highly visible, short-term successes of rapid dissemination of single titles directly to citizens (e.g., the large
number of downloads of the 9/11 report) mask the loss of a secure infrastructure (GPO's Federal Digital System (FDsys)
notwithstanding) for long-term preservation of and access to government information as more and more agencies publish
content on their own Web sites rather than using the GPO conduit (which we in the govt info world call "fugitive documents")
and very few agencies publish to any standards or have policies in place that deal with archiving and preservation. As Chuck
Humphrey, a data librarian friend of mine, once said, “there seems to be an inverse relationship between convenience of
dissemination and preservation standards.”
In addition to this lack of a secure infrastructure, the growing din of the call for digitization of historic govt publications – I
refuse to use the term “legacy”! – from some of the large library associations like ARL, ASERL and CIC, while no doubt a boon
for access today – though with their own unique issues in terms of metadata, provenance, findability, usability etc – is
somewhat of a red herring that makes library administrators believe that they will soon be able to dispose of their physical
collections – not to mention their documents staffs! – and use that space for this week’s buzz word. This call for digitization
may instead have the deleterious affect of damaging the long-term preservation of govt publications.
Lastly, the growing trend toward privatization of govt information has actually caused a decrease in public access despite it's
digital nature. This is not a new trend. Herbert Schiller noted this in 1986 in his book "Information and the Crisis Economy."
Speaking of machine-readable formats, he wrote that, "Library information capability is greatly enhanced. Yet this benefit is
accompanied by the abandonment of libraries' historical free access policy. User charges are introduced. The public character
of the library is weakening as its commercial connection deepens. No less important, the composition and character of its
holdings change as the clientele shifts from general public to the ability-to-pay user."
7. GAO/Thomson contract
Carl Malamud. Public.resource.org. 1/23/13 http://sn.im/gao-contract
Wednesday, January 23, 2013
We've seen over the last several years a disturbing rise in Federal Agencies entering into contracts with private
companies whereby public domain govt documents are digitized and then taken out of the commons via licensing
agreements. See for example, the Government Accountability Office (GAO)'s deal with Thomson-West whereby
Thomson-West digitized the GAO's 20,597 legislative histories of most public laws from 1915-1995 and in return received
exclusive license to sell access to the content. GAO received nothing in return but an account on Thomson's service while
the public received nothing at all.
Last year, NARA entered into a contract with Ancestry.com to serve out the 1940 census schedules (aka enumerators’
notebooks) that were released in 2012 after 72 years. Ancestry agreed to be NARA’s digital infrastructure, offering free
access for 1 year (until April 2013) but henceforth the public would need an Ancestry subscription in order to access the
schedules. And don’t get me started about IBM and Census’ American Factfinder.
Rapid technological change and the misplaced assumption that "it's all in google" have caused some in the FDLP
community to question the need for the FDLP and some others to drop out of the program altogether. I believe that the
inherent nature of digital information actually increases the need for a distributed network of dedicated, legislatively
authorized libraries and librarians. It would be prudent to draw upon the existing infrastructure of FDLP libraries and the
200 years of cumulative experience of these institutions in assuring preservation of and access to government
information. We must reinforce FDLP’s traditional mission of selection, collection, free access, and preservation in the
digital era in order to assure free access to this information into the foreseeable future.
8. FDLP ecosystem
Wednesday, January 23, 2013
Nobody knows for sure how to preserve digital content for the long-term. This means to me that a loosely coupled,
independently administered, distributed ecosystem is the best way to assure long-term preservation -- many
organizations with many funding models and distributed technical infrastructures have a better shot at preservation than 1
or 2 organizations -- especially if one of those organizations has a tenuous budget, or is a private corporation etc. David
Weinberger described the Web in this way in his book “Small pieces loosely joined” and I think that metaphor holds
equally true for libraries. Here’s a back of the napkin kind of sketch of how I imagine the FDLP ecosystem to look.
How would each of these scenarios deal with or react to different stress situations or threat models (directly out of the
OAIS handbook e.g., reduced budgets, increased demand for privatization, increased demand for censorship or control or
removal of information, media/hardware/software/network failure, natural disaster, organizational failure etc.)? It's easy to
see that a highly replicated, distributed FDLP model of preservation based on common open digital standards and OAIS
would deal with these situations much better than a centralized model. A web is much stronger than a silo. This holds true
for all information, not just govt info of course.
Thus ends the soapbox portion of my talk. I’m sure to get back on it later, but for now I’d like to shift gears a bit and talk
about practical matters and about my strategy for collection development and long-term preservation in the FDLP
ecosystem. I’ll run through a few examples for how to conceptualize and actually do digital collection development of govt
information. I like to use a water metaphor to describe my processes. In the digital realm, we have to collect drops of
water, waterfalls as well as the ocean.
First the droplets:
9. EEMs
• Everyday Electronic
Materials
• serendipitous collection
• Collecting the Web a
drop at a time
• Flickr photo by Elle Is Oneirataxic. Attribution-NonCommercial-
ShareAlike 2.0 Generic Creative Commons license
Wednesday, January 23, 2013
EEMs – or Everyday Electronic Materials – is a Mellon Foundation grant-funded project here at Stanford to build infrastructure
and a workflow to support the collection, description, preservation and public access of digital objects by bibliographers and
subject specialists.
EEMs are those digital materials that are serendipitously referenced in news reports, distributed by posting on Web sites, or
through email notification to scholars and bibliographers; those items that selectors come across in the course of doing their
everyday work. In the past, librarians may have downloaded documents to their desktops and perhaps print them out and have
them bound (if their administrations were amenable!). Now we’ve got a digital stacks in which to collect, preserve and give
access!
**For those interested in more, I’ve got a citation and link at the end of the presentation to my colleague Katherine Kott’s report
on the project. For those chomping at the bit now, just Google Kott, EEM, CNI.
Subject specialist workflow is pretty simple:
1. identify a document (*only pdfs and only monographs at this time)
2. drag url of doc to the EEMs browser widget
3. determine copyright status. Request permission from the copyright owner to harvest/preserve if need be (I can usually skip
this step with public domain govt documents!)
4. describe the document (title, author, rights status, notes)
5. submit to acq and cataloging workflow.
6. EEM is locally stored in our digital repository and accessible through our catalog (searchworks)
7. My EEMs workflow also includes reporting fugitive documents to GPO, but I’ll describe that momentarily.
10. Agencies tracked for EEMs
• Bureau of Land Management CA field office
• Department of Justice
• Bureau of Ocean Energy Management, Regulation and Enforcement (BOEMRE)
(including Minerals Management Service)
• NOAA
• National Cancer Institute
• National Institutes of Health
• USDA
• Office of Management and Budget
• **Harvesting with archive-it:
• EPA
• GAO
• Census current industrial reports
• Thanks lost docs blog! http://lostdocs.freegovinfo.info
Wednesday, January 23, 2013
My use of the EEMs workflow and tool grew out of 2 other projects focusing on fugitive govt documents – fugitive documents
are a particular passion of mine!
Particularly through the work of the lostdocs blog (lostdocs.freegovinfo.info) – which tracks fugitive document submissions to
the GPO in order to provide a public listing of fugitive documents – I’ve been able to target several agencies that generally are
the worst offenders in terms of fugitive documents:
We also found that 3 other agencies that were top fugitive offenders published too many documents to make the EEMs
workflow feasible. So I’m harvesting the following 3 agencies with Archive-it (which I’ll describe later):
I have an acquisitions staff person working about 3hrs per month to 1) check the agency publications pages for new
publications; 2) Check the CGP (http://catalog.gpo.gov) to see if the document has made it into the GPO catalog, and 3)
submit a fugitive document report to GPO, and upload the PDF to the EEMs tool.
Besides these federal agencies, I also scour the news – and have a google alert set – for leaked and newsworthy govt
documents like the recently debunked LoC report on Iranian intelligence written about on ProPublica. This is sort of like
reverse engineering the collection development process.
11. EEM: http://searchworks.stanford.edu/view/8707790
Wednesday, January 23, 2013
Through the EEMs workflow, to date we’ve been able to collect over 400 documents like this one (notice the Stanford PURL),
preserve them locally in the Stanford digital repository (SDR) and give access to them through our catalog, searchworks. Think
what we could do if 100 libraries – or 1000! – instituted this workflow? Collectively, we could cover all federal agencies to
assure that no born-digital document within scope of the FDLP falls through the cracks and becomes fugitive.
Next I’ll talk about the ocean:
12. Archive-it
• collecting the Web in
bulk
• Archive-it.org/home/ssrg
• Fotopedia image by Marcus Revertegat. Creative Commons
Attribution 3.0 Unported license.
Wednesday, January 23, 2013
Archive-it is a subscription service from the Internet Archive – which by the way has many digital copies of historic govt
documents and digitized microfilm available in its text collection. It’s an easy collection-building tool whereby you give the
software a list of urls (called “seeds”), schedule the crawler to harvest the seeds, and then give public access to the
content collected. It’s a good way to contextualize or make sense of the ocean of content on the open Web.
Since 2007 we’ve harvested:
Documents Crawled: 58,590,127 (anything from a spacer gif to a mp4 file is considered a “document”)
Data Archived: 4,616.5 GB (4.6 TB!)
13. SULAIR archive-it home:
http://www.archive-it.org/home/SSRG
Wednesday, January 23, 2013
What I’m collecting with Archive-It:
• CRS Reports
• FOIA documents and Agency FOIA reading rooms
• Fugitive US agencies: EPA, GAO etc (shout-out to lostdocs.freegovinfo.info)
• Bay Area governments
• Climate change and environmental policy
• G-20
• CA Dept of education curriculum and instruction
• US budget
• FRUS
14. Collection seeds
https://archive-it.org/public/collection.html?id=1078
Wednesday, January 23, 2013
Metadata: one of our catalogers has created Dublin core metadata at the collection and seed level. Archive-it allows for
metadata at the document level, but we have not done that. We are in the planning stage to index the metadata for our
catalog. We’re also planning to feed archive-it collections into our LOCKSS caches for redistribution and long-term
preservation.
15. search and discover
http://snipurl.com/crs-energyefficiency
Wednesday, January 23, 2013
We give access to the collections via full text search from the archive-it site and from our databases page. Our crawled seeds
also are brought into the wayback machine for public access.
Search can also be embedded into other Web pages (feel free to copy/paste this code!)
16. Paste this into your HTML:
<form action="http://www.archive-it.org/public/search">
<input type="hidden" name="collection"
value="***COLLECTIONID***" />
<input type="text" name="query" />
<input type="submit" name="go" value="Go" />
</form>
***COLLECTIONID*** = 1078 (CRS reports collection)
add search to other pages
</gratuitous_code>
Wednesday, January 23, 2013
<form action="http://www.archive-it.org/public/search">
<input type="hidden" name="collection" value="***COLLECTIONID***" />
<input type="text" name="query" />
<input type="submit" name="go" value="Go" />
</form>
<form action="http://www.archive-it.org/public/search">
<input type="hidden" name="collection" value="1078" />
<input type="text" name="query" />
<input type="submit" name="go" value="Go" />
</form>
Lastly, I’ll mention the waterfall that is LOCKSS-USDOCS.
17. LOCKSS-USDOCS
• Targeted Web collection
and distributed preservation
• Lots of Copies Keep Stuff
Safe
• lockss-usdocs.stanford.edu
• Flickr waterfall picture by discordia1967. That’s actually me at Hanakapi`ai
falls in Kauai :-)
Wednesday, January 23, 2013
lockss-usdocs.stanford.edu
Combines the best of targeted Web harvesting with collaboration and distributed preservation.
18. Wednesday, January 23, 2013
LOCKSS – Lots of Copies Keep Stuff Safe – began at Stanford in 1999. The LOCKSS software was built to solve the problem
of long-term preservation of digital content. It is an open-source distributed digital preservation system based on open
standards (OAIS, OpenURL, HTTP, WARC). Originally LOCKSS was focused on journal literature but over the last 10 years
has been used by other projects focusing on government information, theses and dissertations, numeric data, state records
etc.
The goals of LOCKSS are to spread out the economic cost and responsibility of digital preservation and use off the shelf
hardware and open-source software, so that libraries and content publishers can easily and affordably create, preserve, and
archive local electronic collections and readers can access archived and newly published content transparently at its original
URLs through links resolvers like SFX.
Think of a LOCKSS box as a digitally distributed depository library!
SLIDE 16: DECENTRALIZED PRESERVATION (NEED?)
How does lockss work?
There are 2 parts to the LOCKSS software: harvest and content collection; and content checking and replication.
1) any site – for example FDsys.gov – that gives LOCKSS permission to harvest can be collected by the LOCKSS Web
harvester -- the state of the art in Web harvesting!
2) and this is the cool part: lockss goes through a process of checking and polling all digital content in all of the lockss boxes
on a network. If 1 box has content that is different from all of the other boxes, the software will fix the content, assuring that all
content in the whole network is exactly the same. It is for all intents and purposes injecting stem cells into the network to
replicate and fix content that’s become corrupted over time.
That’s it. LOCKSS is elegant in its simplicity and proven effective in keeping LOCAL(!) digital content safely preserved over
time. This is as close to the unix maxim of “doing one thing, doing it well.”
19. LOCKSS-USDOCS
• LOCKSS for US Documents
• Replicates FDLP in the digital environment
• “digital deposit” (for more on “digital deposit,” see
http://freegovinfo.info/taxonomy/term/3)
• Tamper evident
• 36 libraries and GPO participating
Wednesday, January 23, 2013
So now you can see why some of us in the documents community are so excited about LOCKSS and why we decided to
implement LOCKSS-USDOCS. Portland State and Simon Frasier Universities are the closest partners but I’m always looking
for more.
Using the LOCKSS software we are re-implementing a tamper evident distributed preservation system for digital documents.
Rather than a central silo on a .gov server, digital govt documents reside on 36 servers at 36 different libraries (and counting!).
20. LOCKSS-USDOCS is ...
Federal register, code of federal regulations, congressional
record, congressional bills, congressional reports, US
Code, Public&Private laws, Public Papers of the President,
historic supreme court decisions, US Statutes at Large,
GAO Reports, US Budget ...
and more!!
http://www.gpo.gov/fdsys/browse/collectiontab.action
Wednesday, January 23, 2013
GPO has been instrumental in this process by putting LOCKSS permission statements on all 44 FDsys collections. This
includes:
Federal register, code of federal regulations, congressional record, congressional bills, congressional reports, US Code,
Public&Private laws, Public Papers of the President, historic supreme court decisions, US Statutes at Large, GAO Reports, US
Budget, etc many of these going back to the early 1990s when they first went digital.
In the 2008 Blue Ribbon Task Force on Sustainable Digital Preservation and Access, Abby Smith Rumsey wrote, “Access to
valuable digital materials tomorrow depends upon preservation actions taken today; and, over time, access depends on
ongoing and efficient allocation of resources to preservation.”
With LOCKSS-USDOCS we’re taking collective responsibility today for long-term preservation of digital depository materials.
21. Collaboration
• Farmington Plan Redux
• Summer digital FDLP Institute
• Adopt a federal agency
• Join LOCKSS-USDOCS, TRAIL and other
digitization/digital preservation projects
• Seed the cloud:
• Start blogging your Q&As and editing
Wikipedia articles
http://snipurl.com/qa-average-tariff-
levels
• Catalog, catalog, catalog!
Wednesday, January 23, 2013
Ok, here’s James getting back on his soapbox!
As you can see, the technological tools are there. But there’s a need for a “Farmington Plan redux”:
The Farmington Plan, which lasted from 1948 - 1972, was an innovative ARL program of collaborative collection development
whereby subscribing libraries would have responsibility for collecting and cataloging research materials in certain subject and/
or linguistic areas and would then distribute records (in the form of cards) to the National Union Catalog.
Moving forward, here are some things that we need to do as a community to realize this Farmington Plan Redux and build the
digital FDLP reservoir!:
--First and foremost, set up a summer digital FDLP institute modeled on the ICPSR data library workshop which has trained a
generation of data librarians. Cass and I have talked about this before and I think this is one of the most critical pieces of the
Farmington Plan Redux. The institute would train govt information librarians (and those interested in govt information) on the
ins and outs of the Open Archival Information System (OAIS) and other open digital library tools and standards – including a
proposed standard that my friend and FGI co-conspirator Jim Jacobs and I have written about in a soon to be published D-Lib
article called the "Digital-Surrogate Seal of Approval" (DSSOA), a simple way of describing and guaranteeing to end-users the
quality and accuracy of existing digital surrogates created from printed books and other non-digital originals. The institute
would teach techniques for expanding access to both digital and paper collections, give librarians a framework for updating
their understanding and have increased awareness of digital archival concepts and build and expand their digital toolboxes to
include Web harvesting, digital information collection and organization, building and utilizing Web tools and the semantic Web.
--Adopt a federal agency (or better yet, a local/regional office of a federal agency). Submit fugitive documents to GPO for
inclusion in the CGP and distribution out to other depositories.
--Join LOCKSS, the Technical Reports Archive and Information Library (TRAIL) – shout-out to Mel DeSart who’s been
instrumental in building up TRAIL! – and other digitization/digital preservation projects.
--Seed the cloud:
Start blogging your Q&As and editing Wikipedia articles w library resources. Your users are online and using Google and other
search engines to find stuff. This is an easy way to highlight your collections and your library’s resources and services.
Highlighting your collections online brings users to your library.
Shoutout to Ann Lally and Carolyn Dunford for their 2007 D-Lib article about seeding Wikipedia articles!
22. “...let us save what remains: not by vaults and locks
which fence them from the public eye and use in
consigning them to the waste of time, but by such
a multiplication of copies, as shall place them
beyond the reach of accident.”
— Thomas Jefferson, February 18, 1791
Wednesday, January 23, 2013
digital changes a lot of things about information, but it doesn't change the need to collect it, share it, preserve it, and give
access to it. As my friend, mentor and FGI co-conspirator Jim Jacobs recently stated, "lots of collections keep stuff safe!" (yes
there are 2 of us working on FGI!)
“...let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the
waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident.”
Or in other words:
24. Further reading
• Future of the Federal Depository Library Program. Free Government
Information. http://freegovinfo.info/taxonomy/term/1087
• “Open Government Publications” Letter to Deputy CTO Noveck.
http://freegovinfo.info/node/2970
• “Digital Deposit.” Free Government Information.
http://freegovinfo.info/taxonomy/term/3
• Preservation for all: LOCKSS-USDOCS and our digital future. James Jacobs
and Victoria Reich. Documents to the People (DttP) Volume 38:3 (Fall 2010).
http://freegovinfo.info/system/files/lockssusdocs-dttp38%283%29.pdf
• Everyday Electronic Materials in Policy and Practice. Coalition for Networked
Information (CNI) project briefing. Fall 2010. Katherine Kott.
http://sn.im/eems-report
• A Guide to Distributed Digital Preservation. K. Skinner and M. Schultz, Eds.
(Atlanta, GA: Educopia Institute, 2010). http://www.metaarchive.org/GDDP
• http://lockss-usdocs.stanford.edu
Wednesday, January 23, 2013