1. A survey of web-based art resources with
findings applicable to FARL electronic records
collection development
Alison Rhonemus, LIS 698, Seminar and Practicum, Dr. Tula Giannini
Frick Art Reference Library
Deborah Kempe, Chief, Collections Management & Access
Web Survey and Collection Development
Coffee on the terrace
2. M-LEAD-TWO
Intern enterprises -
"collection assessments, digital resource surveys,
web archiving, provide support for important
consortial programs such as shared resources"
● Brooklyn Museum: Mark Daly, Ronnette Hope,
Project Manager: Emily Atwater
● NYARC Latin American Resources (MOMA):
Ralph Baylor
● FARL: Gretchen Nadasky, Alison Rhonemus
3. Frick Art Reference Library
In early 2011, the Frick Art Reference Library
and the Thomas J. Watson Library at The
Metropolitan Museum of Art completed a pilot
project to address coordinated collecting of
born-digital auction catalogs using ContentDM
and Archive-It.
4. FARL web archiving program is situated in Collection Development.
Current plans for website capture include online auction catalogs and art web resources
cataloged by NYARC.
Fellow MLEAD-TWO intern Gretchen Nadasky has just described online auction
catalogs.
My project focused on NYARC cataloged websites.
5. Web Archiving
"The Internet Archive is already doing it.”
Actually, the IA is providing the tools for
other institutions to use in archiving.
6. ARCHIVE - IT
uses open source tools developed by the
Internet Archive
● Heritrix Web Crawler
● Wayback Interface
● WARC format, an ISO standard
7.
8. the report and manual checks
Partner and WAYBACK interface
Quality Assurance
9. • Password protected sites – can not be archived
• Javascript – more complicated implementation
can be difficult to capture and display. Ongoing
area of development.
• Videos -- difficulty with some proprietary formats
• Form and Database driven content --‐ may be
archived using a sitemap or other direct links to the
content.
Evaluating seeds
10. Robots.txt Blocks
The crawler by default respects all robots.txt files. Check
post--‐crawl reports for blocked seeds or documents
If your site is blocked:
a) Contact the site owner and ask if they will un--‐block
b) Ask your Partner Specialist to turn on “ignore robots”
feature in your account
Notes:
/ denotes single directory seed
subdomains.archive.org (add individually or expand seed)
11. Site Survey Criteria
● html/flash/pdf
● images
● embedded material
● links
● directories and subdomains
● terms, rights statements and permissions
13. More of the obvious
Sites created without the intention of
being archived are the sites in need of
archiving.
14. Survey Says
● 257 cataloged entries
● 168 resources are possible to capture
● 82 resources would require more research or
display definite red flags for web archiving.
● PDFs are available for at least some of the
content in 75 resources.
● Flash was an element in 23 resources
● 16 sites used HTML5
● 54 used a CMS like Drupal or WordPress
15. There were 3 cataloged resources no longer
available on the live web but viewable through
Internet Archive.
Another 2 defunct resources were not available
through Internet Archive.
The main page for one of these lost resources was
available as a snapshot in WAYBACK but the actual
cataloged resource was not available.
21. Plans
● Upcoming grants
● Capture of NYARC institution websites
● Include Wayback interface links in
Arcade catalog records
● Continue to identify websites for
capture and implement capture
22. Conclusions
○ Digital resources not prevalent enough to
reassign current staff
○ Website capture most costly in terms of staff time
○ Copyright continues to be an issue
○ Long term digital preservation needs yet to be
assessed
○ Capture of Frick Collection sites and NYARC will
pose as a challenging test case