Mais conteúdo relacionado Semelhante a Solr powered libraries a survey of the world's knowledge bases (20) Mais de lucenerevolution (20) Solr powered libraries a survey of the world's knowledge bases1. © Copyright 2013 LucidWorks
Solr Powered Libraries:
A survey of the world's knowledge bases
May 2, 2013
Presented by Erik Hatcher
Thursday, May 2, 13
2. © 2013 LucidWorks
Abstract
Using Apache Lucene and Solr search technologies, information and
knowledge have become vastly more searchable, findable, and accessible.
Because scholars and researchers are some of the most demanding users of
search systems, the problems encountered by the implementers are complex.
For example, many of the applications built on these technologies also thrive on
intentionally designed-in serendipitous discovery capabilities, bringing to light
previously unknown, yet related and potentially interesting, content.
Libraries and other public knowledge-sharing environments, such as
Wikipedia, generally embrace "open source" and community improving
contributions as core principles, making a lovely synergy with the power,
features, and community-driven ecosystem provided by Lucene and Solr.
This talk will introduce you to several Solr powered library-related systems,
detail how they work, and leave you with lessons learned that can be applied to
your applications.
2
Thursday, May 2, 13
3. © 2013 LucidWorks
Real Solar Powered Library !
•http://www.ktsm.com/news/texas-library-runs-sunshine
3
Thursday, May 2, 13
4. © 2013 LucidWorks
Card carrying library geek
•Applied Research in Patacriticism (ARP)
- Rossetti Archive: http://www.rossettiarchive.org
- NINES: http://www.nines.org/
- Collex: http://www.collex.org
•Blacklight
- originated as an implementation of Solr Flare
•Presentations
- http://code4lib.org/conference: 2007, 2009, 2010, 2011, 2013
- Library of Congress: "Solr Powered Libraries" (2007)
»http://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=4113
- EBTI/CBETA Conference 2008
- Publication: “Library 2.0 Initiatives in Academic Libraries”
•Windsor Lucene Summit
•eIFL-FOSS
4
Thursday, May 2, 13
7. © 2013 LucidWorks
Card catalog
•the original inverted index
7
http://commons.wikimedia.org/wiki/File:Copyright_Card_Catalog_Files.jpg
Thursday, May 2, 13
11. © 2013 LucidWorks
HathiTrust
• "partnership of major research institutions and libraries working to ensure
that the cultural record is preserved and accessible long into the future."
• 10.5M books, 12TB OCR+metadata, hundreds of languages
- "Books are different"
- http://code4lib.org/conference/2013/burton-west
• http://www.hathitrust.org/blogs/large-scale-search
- http://www.hathitrust.org/blogs/large-scale-search/too-many-words
- "org.apache.solr.common.SolrException: Impossible Exception"
- CommonGrams
- word segmentation: autoGeneratePhraseQueries="false"
• HathiTrust Research Center
- The infrastructure includes an entrance portal, search and collection-building tools (using
Blacklight), ... analysis algorithms that can be run against the HathiTrust public domain corpus
(more than 3 million volumes). In addition to the production services, the HTRC offers a
development “sandbox”. The sandbox runs against non-Google scanned content (about
260,000 volumes) and provides a test-bed for interested researchers to experiment with writing
their own algorithms for use in the HTRC infrastructure.
11
Thursday, May 2, 13
12. © 2013 LucidWorks
Smithsonian Institution
•http://collections.si.edu
•Many disparate data sources:
- 19 museums, 20 libraries, 14 archives,1 National Zoo,1 Astrophysical
Observatory, research centers in Panama,Boston, New York, Maryland,and
Virginia
•"Documents" of all varieties:
- Photographs, paintings, manuscripts, letters, postage stamps,scientific
specimens, rockets, airplanes, postcards, sound recordings, posters,
decorative arts, ceramics, maps, sculptures, publication papers, books, trade
catalogs, etc
•User tagging, negative/exclude filtering, DIH SolrEntityProcessor
•http://bit.ly/13P41YJ
- http://www.basistech.com/pdf/events/open-source-search-conference/
oss-2011-wang-steps-toward-open-government.pdf
12
Thursday, May 2, 13
16. © 2013 LucidWorks
Astrophysics Data System Labs
•Smithsonian, NASA, Harvard
•http://adslabs.org
16
http://code4lib.org/conference/2013/luker
Thursday, May 2, 13
19. © 2013 LucidWorks
• "Blacklight is an open source Ruby on Rails gem that provides a discovery interface for
any Solr index. Blacklight provides a default user interface which is customizable via the
standard Rails (templating) mechanisms. Blacklight accommodates heterogeneous
data, allowing different information displays for different types of objects."
- http://projectblacklight.org
• Founded at the University of Virginia (2007): search.lib.virginia.edu
- UV-A solar radiation == blacklight
• Initial contributors: UVa, Stanford, JHU, WGBH
• University of Hull, United States Holocaust Memorial Museum, University of Wisconsin-
Madison, Tufts, Australian gov't (Natural Resource Management), Penn State's
ScholarSphere, Northwestern, New York Public Library, NCSU, Columbia University,
Agriculture Network Information Center (USDA), alicelaw.org (American Legislative and
Issue Campaign Exchange, is a one-stop web-based public library of progressive state
and local laws), and many more
• http://projecthydra.org/ uses Blacklight as UI component
19
Thursday, May 2, 13
25. © 2013 LucidWorks
Community and Resources
•code4lib:
- http://www.code4lib.org/
•HathiTrust folks
- http://www.hathitrust.org/blogs/large-scale-search
- http://robotlibrarian.billdueber.com/
•http://bighumanities.net/
- The Workshop on Big Humanities will be held in conjunction with the 2013
IEEE International Conference on Big Data (IEEE BigData 2013), which will
take place between 6-9 October 2013 in Silicon Valley, California, USA, and
which provides a leading international forum for disseminating the latest
research in the growing field of “big data
25
Thursday, May 2, 13