SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
© Copyright 2013 LucidWorks
Solr Powered Libraries:
A survey of the world's knowledge bases
May 2, 2013
Presented by Erik Hatcher
Thursday, May 2, 13
© 2013 LucidWorks
Abstract
Using Apache Lucene and Solr search technologies, information and
knowledge have become vastly more searchable, findable, and accessible.
Because scholars and researchers are some of the most demanding users of
search systems, the problems encountered by the implementers are complex.
For example, many of the applications built on these technologies also thrive on
intentionally designed-in serendipitous discovery capabilities, bringing to light
previously unknown, yet related and potentially interesting, content.
Libraries and other public knowledge-sharing environments, such as
Wikipedia, generally embrace "open source" and community improving
contributions as core principles, making a lovely synergy with the power,
features, and community-driven ecosystem provided by Lucene and Solr.
This talk will introduce you to several Solr powered library-related systems,
detail how they work, and leave you with lessons learned that can be applied to
your applications.
2
Thursday, May 2, 13
© 2013 LucidWorks
Real Solar Powered Library !
•http://www.ktsm.com/news/texas-library-runs-sunshine
3
Thursday, May 2, 13
© 2013 LucidWorks
Card carrying library geek
•Applied Research in Patacriticism (ARP)
- Rossetti Archive: http://www.rossettiarchive.org
- NINES: http://www.nines.org/
- Collex: http://www.collex.org
•Blacklight
- originated as an implementation of Solr Flare
•Presentations
- http://code4lib.org/conference: 2007, 2009, 2010, 2011, 2013
- Library of Congress: "Solr Powered Libraries" (2007)
»http://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=4113
- EBTI/CBETA Conference 2008
- Publication: “Library 2.0 Initiatives in Academic Libraries”
•Windsor Lucene Summit
•eIFL-FOSS
4
Thursday, May 2, 13
© 2013 LucidWorks
Rossetti Archive
5
Thursday, May 2, 13
© 2013 LucidWorks
NINES/Collex
6
Thursday, May 2, 13
© 2013 LucidWorks
Card catalog
•the original inverted index
7
http://commons.wikimedia.org/wiki/File:Copyright_Card_Catalog_Files.jpg
Thursday, May 2, 13
© 2013 LucidWorks
•http://openlibrary.org/
- project of the Internet Archive
•Goal: "A (community editable) web page for every book"
8
Thursday, May 2, 13
© 2013 LucidWorks
dp.la - Digital Public Library of America
9
Lucene/ElasticSearch Powered
Thursday, May 2, 13
© 2013 LucidWorks
Wikimedia/Wikipedia/MediaWiki
•Solr powered: translation memory service, GeoData extension,
etc
•"heavily modified Lucene" powers main site search currently
10
Thursday, May 2, 13
© 2013 LucidWorks
HathiTrust
• "partnership of major research institutions and libraries working to ensure
that the cultural record is preserved and accessible long into the future."
• 10.5M books, 12TB OCR+metadata, hundreds of languages
- "Books are different"
- http://code4lib.org/conference/2013/burton-west
• http://www.hathitrust.org/blogs/large-scale-search
- http://www.hathitrust.org/blogs/large-scale-search/too-many-words
- "org.apache.solr.common.SolrException: Impossible Exception"
- CommonGrams
- word segmentation: autoGeneratePhraseQueries="false"
• HathiTrust Research Center
- The infrastructure includes an entrance portal, search and collection-building tools (using
Blacklight), ... analysis algorithms that can be run against the HathiTrust public domain corpus
(more than 3 million volumes). In addition to the production services, the HTRC offers a
development “sandbox”. The sandbox runs against non-Google scanned content (about
260,000 volumes) and provides a test-bed for interested researchers to experiment with writing
their own algorithms for use in the HTRC infrastructure.
11
Thursday, May 2, 13
© 2013 LucidWorks
Smithsonian Institution
•http://collections.si.edu
•Many disparate data sources:
- 19 museums, 20 libraries, 14 archives,1 National Zoo,1 Astrophysical
Observatory, research centers in Panama,Boston, New York, Maryland,and
Virginia
•"Documents" of all varieties:
- Photographs, paintings, manuscripts, letters, postage stamps,scientific
specimens, rockets, airplanes, postcards, sound recordings, posters,
decorative arts, ceramics, maps, sculptures, publication papers, books, trade
catalogs, etc
•User tagging, negative/exclude filtering, DIH SolrEntityProcessor
•http://bit.ly/13P41YJ
- http://www.basistech.com/pdf/events/open-source-search-conference/
oss-2011-wang-steps-toward-open-government.pdf
12
Thursday, May 2, 13
© 2013 LucidWorks
13
Thursday, May 2, 13
© 2013 LucidWorks
14
Thursday, May 2, 13
© 2013 LucidWorks
•SerialsSolutions Summon
•http://www.serialssolutions.com/en/services/summon
•SaaS, single unified index, match & merge
15
Thursday, May 2, 13
© 2013 LucidWorks
Astrophysics Data System Labs
•Smithsonian, NASA, Harvard
•http://adslabs.org
16
http://code4lib.org/conference/2013/luker
Thursday, May 2, 13
© 2013 LucidWorks
•vufind.org
•Powers main HathiTrust UI (currently) and many more
- see http://vufind.org/wiki/installation_status
17
Thursday, May 2, 13
© 2013 LucidWorks
18
Thursday, May 2, 13
© 2013 LucidWorks
• "Blacklight is an open source Ruby on Rails gem that provides a discovery interface for
any Solr index. Blacklight provides a default user interface which is customizable via the
standard Rails (templating) mechanisms. Blacklight accommodates heterogeneous
data, allowing different information displays for different types of objects."
- http://projectblacklight.org
• Founded at the University of Virginia (2007): search.lib.virginia.edu
- UV-A solar radiation == blacklight
• Initial contributors: UVa, Stanford, JHU, WGBH
• University of Hull, United States Holocaust Memorial Museum, University of Wisconsin-
Madison, Tufts, Australian gov't (Natural Resource Management), Penn State's
ScholarSphere, Northwestern, New York Public Library, NCSU, Columbia University,
Agriculture Network Information Center (USDA), alicelaw.org (American Legislative and
Issue Campaign Exchange, is a one-stop web-based public library of progressive state
and local laws), and many more
• http://projecthydra.org/ uses Blacklight as UI component
19
Thursday, May 2, 13
© 2013 LucidWorks
searchworks at Stanford
20
Thursday, May 2, 13
© 2013 LucidWorks
Advanced search at Stanford's searchworks
21
Thursday, May 2, 13
© 2013 LucidWorks
searchworks:
Mapping Text Boxes to Solr query pieces
•http://code4lib.org/conference/2010/dushay_keck
22
Thursday, May 2, 13
© 2013 LucidWorks
•https://catalyst.library.jhu.edu/
23
Thursday, May 2, 13
© 2013 LucidWorks
Rock and Roll!
•m/
24
Thursday, May 2, 13
© 2013 LucidWorks
Community and Resources
•code4lib:
- http://www.code4lib.org/
•HathiTrust folks
- http://www.hathitrust.org/blogs/large-scale-search
- http://robotlibrarian.billdueber.com/
•http://bighumanities.net/
- The Workshop on Big Humanities will be held in conjunction with the 2013
IEEE International Conference on Big Data (IEEE BigData 2013), which will
take place between 6-9 October 2013 in Silicon Valley, California, USA, and
which provides a leading international forum for disseminating the latest
research in the growing field of “big data
25
Thursday, May 2, 13
© 2013 LucidWorks
26
http://heatherbrewer.com/blog/2013/04/15/libraries-rock/
Thursday, May 2, 13

Mais conteúdo relacionado

Mais procurados

Rightscaling, engagement, learning: reconfiguring the library for a network e...
Rightscaling, engagement, learning: reconfiguring the library for a network e...Rightscaling, engagement, learning: reconfiguring the library for a network e...
Rightscaling, engagement, learning: reconfiguring the library for a network e...
lisld
 
The Biodiversity Heritage Library and bibliographic citations: towards new u...
The Biodiversity Heritage Library and bibliographic citations: towards new u...The Biodiversity Heritage Library and bibliographic citations: towards new u...
The Biodiversity Heritage Library and bibliographic citations: towards new u...
Trish Rose-Sandler
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
OCLC
 
Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...
Chris Freeland
 
MichalkoLibrary Collaboration iSchoolShareable-1
MichalkoLibrary Collaboration iSchoolShareable-1MichalkoLibrary Collaboration iSchoolShareable-1
MichalkoLibrary Collaboration iSchoolShareable-1
MRJPM
 

Mais procurados (20)

International Digital Library Initiatives
International Digital Library InitiativesInternational Digital Library Initiatives
International Digital Library Initiatives
 
Rightscaling, engagement, learning: reconfiguring the library for a network e...
Rightscaling, engagement, learning: reconfiguring the library for a network e...Rightscaling, engagement, learning: reconfiguring the library for a network e...
Rightscaling, engagement, learning: reconfiguring the library for a network e...
 
Next Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformNext Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital Platform
 
Building the New Open Linked Library
Building the New Open Linked LibraryBuilding the New Open Linked Library
Building the New Open Linked Library
 
Working collaboratively: scaling infrastructure, services, learning and innov...
Working collaboratively: scaling infrastructure, services, learning and innov...Working collaboratively: scaling infrastructure, services, learning and innov...
Working collaboratively: scaling infrastructure, services, learning and innov...
 
The Biodiversity Heritage Library and bibliographic citations: towards new u...
The Biodiversity Heritage Library and bibliographic citations: towards new u...The Biodiversity Heritage Library and bibliographic citations: towards new u...
The Biodiversity Heritage Library and bibliographic citations: towards new u...
 
Challenges and opportunities for academic libraries
Challenges and opportunities for academic librariesChallenges and opportunities for academic libraries
Challenges and opportunities for academic libraries
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.
 
Collections unbound: collection directions and the RLUK collective collection
Collections unbound: collection directions and the RLUK collective collectionCollections unbound: collection directions and the RLUK collective collection
Collections unbound: collection directions and the RLUK collective collection
 
Documenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repositoryDocumenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repository
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 
Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...
 
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
 
Burke, "Discovery Tools - Changing the Nature of Collections in an Item-cente...
Burke, "Discovery Tools - Changing the Nature of Collections in an Item-cente...Burke, "Discovery Tools - Changing the Nature of Collections in an Item-cente...
Burke, "Discovery Tools - Changing the Nature of Collections in an Item-cente...
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
MichalkoLibrary Collaboration iSchoolShareable-1
MichalkoLibrary Collaboration iSchoolShareable-1MichalkoLibrary Collaboration iSchoolShareable-1
MichalkoLibrary Collaboration iSchoolShareable-1
 
Organizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in MissouriOrganizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in Missouri
 
[[edit]] this GLAM
[[edit]] this GLAM[[edit]] this GLAM
[[edit]] this GLAM
 
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
 
Islandora and Omeka: Building U of T Digital Collections & Exhibits
Islandora and Omeka: Building U of T Digital Collections & ExhibitsIslandora and Omeka: Building U of T Digital Collections & Exhibits
Islandora and Omeka: Building U of T Digital Collections & Exhibits
 

Semelhante a Solr powered libraries a survey of the world's knowledge bases

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
Jon Voss
 
Agile resources on the open web …. a global digital library
Agile resources on the open web …. a global digital libraryAgile resources on the open web …. a global digital library
Agile resources on the open web …. a global digital library
Jisc
 
Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008
Karen S Calhoun
 
Charper.lawdi.20130531
Charper.lawdi.20130531Charper.lawdi.20130531
Charper.lawdi.20130531
charper
 
Fuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network FlowFuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network Flow
kramsey
 
Virtual systems
Virtual systemsVirtual systems
Virtual systems
jsutclif
 

Semelhante a Solr powered libraries a survey of the world's knowledge bases (20)

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Providing First World Library services By using Koha, DSpace, vufind and Drupal
Providing First World Library services By using  Koha, DSpace, vufind and DrupalProviding First World Library services By using  Koha, DSpace, vufind and Drupal
Providing First World Library services By using Koha, DSpace, vufind and Drupal
 
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
"In the Early Days of a Better Nation": Enhancing the power of metadata today..."In the Early Days of a Better Nation": Enhancing the power of metadata today...
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
 
Library discovery: past, present and some futures
Library discovery: past, present and some futuresLibrary discovery: past, present and some futures
Library discovery: past, present and some futures
 
Agile resources on the open web …. a global digital library
Agile resources on the open web …. a global digital libraryAgile resources on the open web …. a global digital library
Agile resources on the open web …. a global digital library
 
Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008
 
Charper.lawdi.20130531
Charper.lawdi.20130531Charper.lawdi.20130531
Charper.lawdi.20130531
 
Open Source ILS Add-Ons
Open Source ILS Add-OnsOpen Source ILS Add-Ons
Open Source ILS Add-Ons
 
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
 
Fuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network FlowFuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network Flow
 
Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
 
Boundless Opportunity
Boundless OpportunityBoundless Opportunity
Boundless Opportunity
 
The Open Access Community, and OAIster
The Open Access Community, and OAIsterThe Open Access Community, and OAIster
The Open Access Community, and OAIster
 
LOD/LAM Presentation
LOD/LAM PresentationLOD/LAM Presentation
LOD/LAM Presentation
 
Open access (1)
Open access (1)Open access (1)
Open access (1)
 
Virtual systems
Virtual systemsVirtual systems
Virtual systems
 
Oair du
Oair duOair du
Oair du
 
Scholarship in a connected world: New ways to know, new ways to show
Scholarship in a connected world: New ways to know, new ways to showScholarship in a connected world: New ways to know, new ways to show
Scholarship in a connected world: New ways to know, new ways to show
 
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository MeetingNetworking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
 
Open access
Open accessOpen access
Open access
 

Mais de lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

Mais de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 

Solr powered libraries a survey of the world's knowledge bases

  • 1. © Copyright 2013 LucidWorks Solr Powered Libraries: A survey of the world's knowledge bases May 2, 2013 Presented by Erik Hatcher Thursday, May 2, 13
  • 2. © 2013 LucidWorks Abstract Using Apache Lucene and Solr search technologies, information and knowledge have become vastly more searchable, findable, and accessible. Because scholars and researchers are some of the most demanding users of search systems, the problems encountered by the implementers are complex. For example, many of the applications built on these technologies also thrive on intentionally designed-in serendipitous discovery capabilities, bringing to light previously unknown, yet related and potentially interesting, content. Libraries and other public knowledge-sharing environments, such as Wikipedia, generally embrace "open source" and community improving contributions as core principles, making a lovely synergy with the power, features, and community-driven ecosystem provided by Lucene and Solr. This talk will introduce you to several Solr powered library-related systems, detail how they work, and leave you with lessons learned that can be applied to your applications. 2 Thursday, May 2, 13
  • 3. © 2013 LucidWorks Real Solar Powered Library ! •http://www.ktsm.com/news/texas-library-runs-sunshine 3 Thursday, May 2, 13
  • 4. © 2013 LucidWorks Card carrying library geek •Applied Research in Patacriticism (ARP) - Rossetti Archive: http://www.rossettiarchive.org - NINES: http://www.nines.org/ - Collex: http://www.collex.org •Blacklight - originated as an implementation of Solr Flare •Presentations - http://code4lib.org/conference: 2007, 2009, 2010, 2011, 2013 - Library of Congress: "Solr Powered Libraries" (2007) »http://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=4113 - EBTI/CBETA Conference 2008 - Publication: “Library 2.0 Initiatives in Academic Libraries” •Windsor Lucene Summit •eIFL-FOSS 4 Thursday, May 2, 13
  • 5. © 2013 LucidWorks Rossetti Archive 5 Thursday, May 2, 13
  • 7. © 2013 LucidWorks Card catalog •the original inverted index 7 http://commons.wikimedia.org/wiki/File:Copyright_Card_Catalog_Files.jpg Thursday, May 2, 13
  • 8. © 2013 LucidWorks •http://openlibrary.org/ - project of the Internet Archive •Goal: "A (community editable) web page for every book" 8 Thursday, May 2, 13
  • 9. © 2013 LucidWorks dp.la - Digital Public Library of America 9 Lucene/ElasticSearch Powered Thursday, May 2, 13
  • 10. © 2013 LucidWorks Wikimedia/Wikipedia/MediaWiki •Solr powered: translation memory service, GeoData extension, etc •"heavily modified Lucene" powers main site search currently 10 Thursday, May 2, 13
  • 11. © 2013 LucidWorks HathiTrust • "partnership of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future." • 10.5M books, 12TB OCR+metadata, hundreds of languages - "Books are different" - http://code4lib.org/conference/2013/burton-west • http://www.hathitrust.org/blogs/large-scale-search - http://www.hathitrust.org/blogs/large-scale-search/too-many-words - "org.apache.solr.common.SolrException: Impossible Exception" - CommonGrams - word segmentation: autoGeneratePhraseQueries="false" • HathiTrust Research Center - The infrastructure includes an entrance portal, search and collection-building tools (using Blacklight), ... analysis algorithms that can be run against the HathiTrust public domain corpus (more than 3 million volumes). In addition to the production services, the HTRC offers a development “sandbox”. The sandbox runs against non-Google scanned content (about 260,000 volumes) and provides a test-bed for interested researchers to experiment with writing their own algorithms for use in the HTRC infrastructure. 11 Thursday, May 2, 13
  • 12. © 2013 LucidWorks Smithsonian Institution •http://collections.si.edu •Many disparate data sources: - 19 museums, 20 libraries, 14 archives,1 National Zoo,1 Astrophysical Observatory, research centers in Panama,Boston, New York, Maryland,and Virginia •"Documents" of all varieties: - Photographs, paintings, manuscripts, letters, postage stamps,scientific specimens, rockets, airplanes, postcards, sound recordings, posters, decorative arts, ceramics, maps, sculptures, publication papers, books, trade catalogs, etc •User tagging, negative/exclude filtering, DIH SolrEntityProcessor •http://bit.ly/13P41YJ - http://www.basistech.com/pdf/events/open-source-search-conference/ oss-2011-wang-steps-toward-open-government.pdf 12 Thursday, May 2, 13
  • 15. © 2013 LucidWorks •SerialsSolutions Summon •http://www.serialssolutions.com/en/services/summon •SaaS, single unified index, match & merge 15 Thursday, May 2, 13
  • 16. © 2013 LucidWorks Astrophysics Data System Labs •Smithsonian, NASA, Harvard •http://adslabs.org 16 http://code4lib.org/conference/2013/luker Thursday, May 2, 13
  • 17. © 2013 LucidWorks •vufind.org •Powers main HathiTrust UI (currently) and many more - see http://vufind.org/wiki/installation_status 17 Thursday, May 2, 13
  • 19. © 2013 LucidWorks • "Blacklight is an open source Ruby on Rails gem that provides a discovery interface for any Solr index. Blacklight provides a default user interface which is customizable via the standard Rails (templating) mechanisms. Blacklight accommodates heterogeneous data, allowing different information displays for different types of objects." - http://projectblacklight.org • Founded at the University of Virginia (2007): search.lib.virginia.edu - UV-A solar radiation == blacklight • Initial contributors: UVa, Stanford, JHU, WGBH • University of Hull, United States Holocaust Memorial Museum, University of Wisconsin- Madison, Tufts, Australian gov't (Natural Resource Management), Penn State's ScholarSphere, Northwestern, New York Public Library, NCSU, Columbia University, Agriculture Network Information Center (USDA), alicelaw.org (American Legislative and Issue Campaign Exchange, is a one-stop web-based public library of progressive state and local laws), and many more • http://projecthydra.org/ uses Blacklight as UI component 19 Thursday, May 2, 13
  • 20. © 2013 LucidWorks searchworks at Stanford 20 Thursday, May 2, 13
  • 21. © 2013 LucidWorks Advanced search at Stanford's searchworks 21 Thursday, May 2, 13
  • 22. © 2013 LucidWorks searchworks: Mapping Text Boxes to Solr query pieces •http://code4lib.org/conference/2010/dushay_keck 22 Thursday, May 2, 13
  • 24. © 2013 LucidWorks Rock and Roll! •m/ 24 Thursday, May 2, 13
  • 25. © 2013 LucidWorks Community and Resources •code4lib: - http://www.code4lib.org/ •HathiTrust folks - http://www.hathitrust.org/blogs/large-scale-search - http://robotlibrarian.billdueber.com/ •http://bighumanities.net/ - The Workshop on Big Humanities will be held in conjunction with the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), which will take place between 6-9 October 2013 in Silicon Valley, California, USA, and which provides a leading international forum for disseminating the latest research in the growing field of “big data 25 Thursday, May 2, 13