SlideShare uma empresa Scribd logo
1 de 57
Baixar para ler offline
Accessing	
  Your	
  Library	
  Book	
  
        Collec5ons	
  Using	
  Solr	
  




                  By: Engy Morsy
     Software project manager, Bibliotheca Alexandrina
                 engy.morsy@bibalex.org
5/14/12	
  
                                 	
  
                         h(p://dar.bibalex.org	
         1	
  
 

              BA	
  &	
  Solr	
  
5/14/12	
        h(p://dar.bibalex.org	
     2	
  
h(p://bibalex.org	
  




5/14/12	
           h(p://dar.bibalex.org	
     3	
  
h(p://wamcp.bibalex.org	
  




5/14/12	
               h(p://dar.bibalex.org	
     4	
  
h(p://ssc.bibalex.org	
  




5/14/12	
             h(p://dar.bibalex.org	
     5	
  
h(p://dar.bibalex.org	
  




5/14/12	
             h(p://dar.bibalex.org	
     6	
  
Introductory	
  Video	
  




5/14/12	
             h(p://dar.bibalex.org	
     7	
  
Agenda	
  

•      Brief	
  introducFon	
  to	
  DAR	
  architecture	
  
•      Indexing	
  books’	
  collecFon	
  
•      Searching	
  across	
  Metadata	
  and	
  Content	
  
•      FaceFng	
  	
  
•      Searching	
  Book	
  Content	
  
•      Solr	
  with	
  personalizaFon	
  
•      Future	
  
•      Q&A	
  
5/14/12	
                     h(p://dar.bibalex.org	
          8	
  
About	
  1.5	
  Million	
  books	
  




5/14/12	
                  h(p://dar.bibalex.org	
     9	
  
Digital	
  Assets	
  Repository	
  


5/14/12	
       h(p://dar.bibalex.org	
     10	
  
Digital	
  Assets	
  Repository	
  




5/14/12	
               h(p://dar.bibalex.org	
     11	
  
Book	
  site	
  
•      Approximately	
  260,000	
  books	
  	
  
•      Nearly	
  220,000	
  	
  books	
  published	
  online	
  	
  
•      About	
  1.5	
  TB	
  of	
  content	
  
•      Average	
  book	
  size	
  6	
  MB	
  	
  
•      Daily	
  indexing	
  rate	
  is	
  about	
  150	
  books.	
  




5/14/12	
                         h(p://dar.bibalex.org	
              12	
  
What	
  do	
  we	
  want…?	
  
•  Allow	
  simple	
  and	
  advanced	
  search	
  across	
  
   metadata	
  and	
  content	
  in	
  5	
  languages	
  




5/14/12	
                  h(p://dar.bibalex.org	
              13	
  
Simple	
  Search	
  




5/14/12	
          h(p://dar.bibalex.org	
     14	
  
What	
  do	
  we	
  want…?	
  
•  Allow	
  simple	
  and	
  advanced	
  search	
  across	
  
     metadata	
  and	
  content	
  in	
  5	
  languages	
  
•  FaceFng	
  
	
  




5/14/12	
                  h(p://dar.bibalex.org	
              15	
  
What	
  do	
  we	
  want…?	
  
•  Allow	
  simple	
  and	
  advanced	
  search	
  across	
  
     metadata	
  and	
  content	
  in	
  5	
  languages	
  
•  FaceFng	
  
•  AnnotaFons	
  
	
  




5/14/12	
                  h(p://dar.bibalex.org	
              20	
  
Text	
  Underlining	
  
Text	
  Highligh5ng	
  
Adding	
  S5cky	
  Notes	
  
What	
  do	
  we	
  want…?	
  
•  Allow	
  simple	
  and	
  advanced	
  search	
  across	
  
     metadata	
  and	
  content	
  in	
  5	
  languages	
  
•  FaceFng	
  
•  AnnotaFons	
  
•  PersonalizaFon	
  
	
  



5/14/12	
                  h(p://dar.bibalex.org	
              25	
  
Arranging	
  Books	
  in	
  
   Bookshelves	
  
SubmiIng	
  Comments	
  
Ra5ng	
  
Embedding	
  
Sharing	
  the	
  book	
  link	
  in	
  other	
  social	
  
networks	
  
What	
  lies	
  beneath!!	
  




5/14/12	
              h(p://dar.bibalex.org	
     31	
  
Book	
  site	
  indices	
  


                                              Query	
  



                AR	
          EN	
  	
               FR	
                IT	
        SP	
  
              Index	
       Index	
                Index	
             Index	
     Index	
  




5/14/12	
                                  h(p://dar.bibalex.org	
                             32	
  
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Indexing	
  Book	
  CollecFon	
  
•  Index	
  per	
  language	
  
•  A	
  Document	
  in	
  the	
  content	
  index	
  correspond	
  
   to	
  a	
  page	
  in	
  a	
  book	
  
•  Maintain	
  a	
  field	
  to	
  disFnguish	
  between	
  
   metadata	
  record	
  and	
  content	
  record	
  (e.g.	
  
   SolrType)	
  
•  Use	
  staFc	
  fields	
  for	
  all	
  content	
  index	
  (e.g.	
  
   PageID..etc)	
  

5/14/12	
                                h(p://dar.bibalex.org	
                                33	
  
What	
  is	
  the	
  problem	
  with	
  this	
  
                              solu5on?	
  




5/14/12	
                       h(p://dar.bibalex.org	
          34	
  
Problem	
  for	
  content	
  search	
  
Example	
  :	
  Advanced	
  Search	
  
    	
  search	
  for	
  	
  
    	
          	
  Title:	
  Mobile	
  Technology	
  
    	
          	
  And	
  	
  
    	
          	
  Content	
  :	
  “cloud	
  compuFng”	
  




5/14/12	
                    h(p://dar.bibalex.org	
          35	
  
Proposed	
  soluFon	
  
                        SolrType	
  
Title:	
  Mobile	
                                                   Result	
  
                        	
  
Technology	
                         ..	
  index	
                    IDs	
  
                        	
  Meta	
  



                                                                   Get	
                                         Final	
  
                                                               intersecFon	
                  ..	
  index	
     result	
  



Content	
  :	
         SolrType	
  
                       	
  	
                                          Facet	
      Parent	
  Book	
  IDs	
  
“cloud	
  
                       	
                ..	
  index	
                 result	
  
compuFng”	
  
                       Content	
  



5/14/12	
                                          h(p://dar.bibalex.org	
                                          36	
  
The	
  problem	
  is…	
  
•  Can’t	
  get	
  the	
  faceFng	
  result	
  directly	
  from	
  the	
  
   content	
  index	
  
•  Need	
  to	
  query	
  the	
  metadata	
  index	
  in	
  order	
  to	
  
   get	
  the	
  final	
  facet	
  result	
  
                           processing	
  Fme!!!	
  




5/14/12	
                      h(p://dar.bibalex.org	
                    37	
  
SoluFon…!	
  
•  Metadata	
  denormalizaFon	
  
         –  Denormalize	
  metadata	
  into	
  content	
  index	
  




5/14/12	
                          h(p://dar.bibalex.org	
            38	
  
Proposed	
  soluFon	
  
                        SolrType	
  
Title:	
  Mobile	
                                                   Result	
  
                        	
  
Technology	
                         ..	
  index	
                    IDs	
  
                        	
  Meta	
  



                                                                                        Get	
          Final	
  
                                                                                    intersecFon	
     result	
  



Content	
  :	
         SolrType	
  
“cloud	
               	
  	
                                          Facet	
  
                       	
                ..	
  index	
                 result	
  
compuFng”	
  
                       Content	
  



5/14/12	
                                          h(p://dar.bibalex.org	
                                39	
  
 Problem	
  for	
  content	
  search	
  
•  Metadata	
  denormalizaFon…..	
  	
  

                               Worst	
  choice!	
  
        	
                •  Re-­‐indexing	
  for	
  changes	
  in	
  
                             metadata	
  
                          •  Data	
  processing	
  is	
  required.	
  
                   	
  


5/14/12	
                               h(p://dar.bibalex.org	
          40	
  
New	
  Solu5on	
  




5/14/12	
         h(p://dar.bibalex.org	
     41	
  
Indexing	
  Metadata 	
  	
  
•  Index	
  per	
  language	
  	
  
•  Separate	
  content	
  and	
  metadata	
  index	
  
•  	
  Text	
  field	
  holds	
  the	
  whole	
  book	
  content	
  in	
  
   the	
  metadata	
  index	
  
         –  The	
  maxFieldLength	
  has	
  been	
  set	
  to	
  maximum.	
  
              •  e.g:	
  2147483647	
  




5/14/12	
                                 h(p://dar.bibalex.org	
               42	
  
Back	
  to	
  the	
  example	
  
Example	
  :	
  Advanced	
  Search	
  
    	
  search	
  for	
  	
  
    	
          	
  Title:	
  Mobile	
  Technology	
  
    	
          	
  And	
  	
  
    	
          	
  Content	
  :	
  “cloud	
  compuFng”	
  




5/14/12	
                    h(p://dar.bibalex.org	
          43	
  
SoluFon	
  


              Title:	
  Mobile	
  
              Technology	
  
                                                     Meta	
       Facet	
  
                                                     index	
      result	
  
              Content	
  :	
  
              “cloud	
  
              compuFng”	
  




5/14/12	
                             h(p://dar.bibalex.org	
                  44	
  
soluFon	
  

Title:	
  Mobile	
  
Technology	
           Meta	
  
                       index	
  


                                             Get	
             Meta	
      Facet	
  
                                         intersecFon	
         index	
     result	
  



Content	
  :	
  
“cloud	
               Content	
  
compuFng”	
             index	
  




5/14/12	
                          h(p://dar.bibalex.org	
                      45	
  
 
              	
  Separate	
  indexes	
  Vs.	
  All	
  in	
  one	
  
                                   	
  
•  Separate	
  indexes	
  
         +  Indexing	
  Fme	
  
         +  Index	
  size	
  
         -­‐  Processing	
  results	
  (facets..)	
  
         -­‐  Scoring	
  




5/14/12	
                             h(p://dar.bibalex.org	
          46	
  
 
                	
  Separate	
  indexes	
  Vs.	
  All	
  in	
  one	
  
                                     	
  
•  Separate	
  indexes	
  
         +      Indexing	
  Fme	
  
         +      Index	
  size	
  
         -­‐    Processing	
  results	
  (facets..)	
  
         -­‐    Scoring	
  

•  One	
  index	
  
         –  Index	
  size	
  
         –  Indexing	
  Fme	
  
         + Scoring	
  
         + Processing	
  Fme	
  

5/14/12	
                                                 h(p://dar.bibalex.org	
     47	
  
Book	
  content	
  index	
  




                AR	
            EN	
  	
               FR	
                IT	
        SP	
  
              Index	
         Index	
                Index	
             Index	
     Index	
  




5/14/12	
                                    h(p://dar.bibalex.org	
                             48	
  
5/14/12	
     h(p://dar.bibalex.org	
     49	
  
Searching	
  
•  Simple	
  and	
  	
  advanced	
  search	
  
         –  Cache	
  the	
  resulted	
  IDs	
  only	
  
•  HighlighFng	
  search	
  result	
  
         –  Get	
  the	
  full	
  search	
  result	
  and	
  highlight	
  per	
  page	
  
              result	
  
         	
  
	
  


5/14/12	
                               h(p://dar.bibalex.org	
                             50	
  
Book	
  Content	
  Search	
  
•  Search	
  using	
  
         –  Search	
  query	
  
         –  Book	
  ID	
  
         –  List	
  of	
  pages’	
  IDs	
  
•  Highlights	
  
•  AnnotaFons	
  
         –  Saved	
  currently	
  in	
  DB	
  


5/14/12	
                                     h(p://dar.bibalex.org	
     51	
  
FaceFng	
  
•  Fixed	
  facet	
  fields	
  	
  
         –  Category,	
  sub-­‐category,	
  language..etc.	
  
         –  Stored,	
  indexed,	
  exact	
  fields	
  
•  Process	
  facets	
  from	
  different	
  indices	
  




5/14/12	
                            h(p://dar.bibalex.org	
     52	
  
PersonalizaFon	
  
•  Using	
  separate	
  index	
  of	
  personalizaFon	
  	
  
         –  Different	
  Solr	
  fields	
  for	
  different	
  languages.	
  
         –  Search	
  across	
  all	
  fields.	
  
•  Saving	
  in	
  both	
  Solr	
  and	
  DB	
  
•  Indexing	
  tags,	
  raFng	
  and	
  comments	
  using	
  type	
  
   field	
  

	
  
5/14/12	
                            h(p://dar.bibalex.org	
                 53	
  
Future	
  
•  Book	
  mobile	
  applicaFon	
  using	
  Solr	
  
•  Using	
  Hadoop	
  	
  
•  Indexing	
  other	
  digital	
  media	
  (Maps,	
  audio,	
  
   video)	
  




5/14/12	
                   h(p://dar.bibalex.org	
                54	
  
Contact	
  	
  
                                 	
  
                                 	
  
                 engy.morsy	
  @bibalex.org	
  
          Library	
  website:	
  h(p://bibalex.org	
  
   Digital	
  Asset	
  Repository:	
  h(p://dar.bibalex.org	
  
                                 	
  



5/14/12	
                  h(p://dar.bibalex.org	
            55	
  
5/14/12	
     h(p://dar.bibalex.org	
     56	
  
Thank	
  you…	
  




5/14/12	
         h(p://dar.bibalex.org	
     57	
  

Mais conteúdo relacionado

Semelhante a How to Access Your Library Book Collections Using Solr

Presentation distro recipes-2013
Presentation distro recipes-2013Presentation distro recipes-2013
Presentation distro recipes-2013olberger
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Anne Nicolas
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon PresentationGyula Fóra
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...datascienceiqss
 
Leyline: A provenance-based desktop search
Leyline: A provenance-based desktop searchLeyline: A provenance-based desktop search
Leyline: A provenance-based desktop searchSoroush Ghorashi
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigLester Martin
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Bits and Pieces from the UPEI Experience
Bits and Pieces from the UPEI ExperienceBits and Pieces from the UPEI Experience
Bits and Pieces from the UPEI ExperienceEvergreen ILS
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Mark Wilkinson
 
The current architecture of TYPO3 5.0
The current architecture of TYPO3 5.0The current architecture of TYPO3 5.0
The current architecture of TYPO3 5.0Robert Lemke
 
RDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itRDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itJose Luis Lopez Pino
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeDan Brickley
 
F/LOSS in Norwegian libraries
F/LOSS in Norwegian librariesF/LOSS in Norwegian libraries
F/LOSS in Norwegian librariesLibriotech
 
Unboxing ML Models... Plus CoreML!
Unboxing ML Models... Plus CoreML!Unboxing ML Models... Plus CoreML!
Unboxing ML Models... Plus CoreML!Ray Deck
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Bradley Allen
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.orgNorman Morrison
 
Ks2008 Semanticweb In Action
Ks2008 Semanticweb In ActionKs2008 Semanticweb In Action
Ks2008 Semanticweb In ActionRinke Hoekstra
 

Semelhante a How to Access Your Library Book Collections Using Solr (20)

Presentation distro recipes-2013
Presentation distro recipes-2013Presentation distro recipes-2013
Presentation distro recipes-2013
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
 
Publishing Linked Data from RDB
Publishing Linked Data from RDBPublishing Linked Data from RDB
Publishing Linked Data from RDB
 
IKON Final Presentation
IKON Final PresentationIKON Final Presentation
IKON Final Presentation
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon Presentation
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
 
Leyline: A provenance-based desktop search
Leyline: A provenance-based desktop searchLeyline: A provenance-based desktop search
Leyline: A provenance-based desktop search
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs Pig
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
Bits and Pieces from the UPEI Experience
Bits and Pieces from the UPEI ExperienceBits and Pieces from the UPEI Experience
Bits and Pieces from the UPEI Experience
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
The current architecture of TYPO3 5.0
The current architecture of TYPO3 5.0The current architecture of TYPO3 5.0
The current architecture of TYPO3 5.0
 
RDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itRDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use it
 
SemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in PracticeSemWeb Fundamentals - Info Linking & Layering in Practice
SemWeb Fundamentals - Info Linking & Layering in Practice
 
F/LOSS in Norwegian libraries
F/LOSS in Norwegian librariesF/LOSS in Norwegian libraries
F/LOSS in Norwegian libraries
 
Unboxing ML Models... Plus CoreML!
Unboxing ML Models... Plus CoreML!Unboxing ML Models... Plus CoreML!
Unboxing ML Models... Plus CoreML!
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
 
Ks2008 Semanticweb In Action
Ks2008 Semanticweb In ActionKs2008 Semanticweb In Action
Ks2008 Semanticweb In Action
 

Mais de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Mais de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Último

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 

Último (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 

How to Access Your Library Book Collections Using Solr

  • 1. Accessing  Your  Library  Book   Collec5ons  Using  Solr   By: Engy Morsy Software project manager, Bibliotheca Alexandrina engy.morsy@bibalex.org 5/14/12     h(p://dar.bibalex.org   1  
  • 2.   BA  &  Solr   5/14/12   h(p://dar.bibalex.org   2  
  • 3. h(p://bibalex.org   5/14/12   h(p://dar.bibalex.org   3  
  • 4. h(p://wamcp.bibalex.org   5/14/12   h(p://dar.bibalex.org   4  
  • 5. h(p://ssc.bibalex.org   5/14/12   h(p://dar.bibalex.org   5  
  • 6. h(p://dar.bibalex.org   5/14/12   h(p://dar.bibalex.org   6  
  • 7. Introductory  Video   5/14/12   h(p://dar.bibalex.org   7  
  • 8. Agenda   •  Brief  introducFon  to  DAR  architecture   •  Indexing  books’  collecFon   •  Searching  across  Metadata  and  Content   •  FaceFng     •  Searching  Book  Content   •  Solr  with  personalizaFon   •  Future   •  Q&A   5/14/12   h(p://dar.bibalex.org   8  
  • 9. About  1.5  Million  books   5/14/12   h(p://dar.bibalex.org   9  
  • 10. Digital  Assets  Repository   5/14/12   h(p://dar.bibalex.org   10  
  • 11. Digital  Assets  Repository   5/14/12   h(p://dar.bibalex.org   11  
  • 12. Book  site   •  Approximately  260,000  books     •  Nearly  220,000    books  published  online     •  About  1.5  TB  of  content   •  Average  book  size  6  MB     •  Daily  indexing  rate  is  about  150  books.   5/14/12   h(p://dar.bibalex.org   12  
  • 13. What  do  we  want…?   •  Allow  simple  and  advanced  search  across   metadata  and  content  in  5  languages   5/14/12   h(p://dar.bibalex.org   13  
  • 14. Simple  Search   5/14/12   h(p://dar.bibalex.org   14  
  • 15. What  do  we  want…?   •  Allow  simple  and  advanced  search  across   metadata  and  content  in  5  languages   •  FaceFng     5/14/12   h(p://dar.bibalex.org   15  
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. What  do  we  want…?   •  Allow  simple  and  advanced  search  across   metadata  and  content  in  5  languages   •  FaceFng   •  AnnotaFons     5/14/12   h(p://dar.bibalex.org   20  
  • 21.
  • 25. What  do  we  want…?   •  Allow  simple  and  advanced  search  across   metadata  and  content  in  5  languages   •  FaceFng   •  AnnotaFons   •  PersonalizaFon     5/14/12   h(p://dar.bibalex.org   25  
  • 26. Arranging  Books  in   Bookshelves  
  • 30. Sharing  the  book  link  in  other  social   networks  
  • 31. What  lies  beneath!!   5/14/12   h(p://dar.bibalex.org   31  
  • 32. Book  site  indices   Query   AR   EN     FR   IT   SP   Index   Index   Index   Index   Index   5/14/12   h(p://dar.bibalex.org   32  
  • 33.                          Indexing  Book  CollecFon   •  Index  per  language   •  A  Document  in  the  content  index  correspond   to  a  page  in  a  book   •  Maintain  a  field  to  disFnguish  between   metadata  record  and  content  record  (e.g.   SolrType)   •  Use  staFc  fields  for  all  content  index  (e.g.   PageID..etc)   5/14/12   h(p://dar.bibalex.org   33  
  • 34. What  is  the  problem  with  this   solu5on?   5/14/12   h(p://dar.bibalex.org   34  
  • 35. Problem  for  content  search   Example  :  Advanced  Search    search  for        Title:  Mobile  Technology      And        Content  :  “cloud  compuFng”   5/14/12   h(p://dar.bibalex.org   35  
  • 36. Proposed  soluFon   SolrType   Title:  Mobile   Result     Technology   ..  index   IDs    Meta   Get   Final   intersecFon   ..  index   result   Content  :   SolrType       Facet   Parent  Book  IDs   “cloud     ..  index   result   compuFng”   Content   5/14/12   h(p://dar.bibalex.org   36  
  • 37. The  problem  is…   •  Can’t  get  the  faceFng  result  directly  from  the   content  index   •  Need  to  query  the  metadata  index  in  order  to   get  the  final  facet  result   processing  Fme!!!   5/14/12   h(p://dar.bibalex.org   37  
  • 38. SoluFon…!   •  Metadata  denormalizaFon   –  Denormalize  metadata  into  content  index   5/14/12   h(p://dar.bibalex.org   38  
  • 39. Proposed  soluFon   SolrType   Title:  Mobile   Result     Technology   ..  index   IDs    Meta   Get   Final   intersecFon   result   Content  :   SolrType   “cloud       Facet     ..  index   result   compuFng”   Content   5/14/12   h(p://dar.bibalex.org   39  
  • 40.  Problem  for  content  search   •  Metadata  denormalizaFon…..     Worst  choice!     •  Re-­‐indexing  for  changes  in   metadata   •  Data  processing  is  required.     5/14/12   h(p://dar.bibalex.org   40  
  • 41. New  Solu5on   5/14/12   h(p://dar.bibalex.org   41  
  • 42. Indexing  Metadata     •  Index  per  language     •  Separate  content  and  metadata  index   •   Text  field  holds  the  whole  book  content  in   the  metadata  index   –  The  maxFieldLength  has  been  set  to  maximum.   •  e.g:  2147483647   5/14/12   h(p://dar.bibalex.org   42  
  • 43. Back  to  the  example   Example  :  Advanced  Search    search  for        Title:  Mobile  Technology      And        Content  :  “cloud  compuFng”   5/14/12   h(p://dar.bibalex.org   43  
  • 44. SoluFon   Title:  Mobile   Technology   Meta   Facet   index   result   Content  :   “cloud   compuFng”   5/14/12   h(p://dar.bibalex.org   44  
  • 45. soluFon   Title:  Mobile   Technology   Meta   index   Get   Meta   Facet   intersecFon   index   result   Content  :   “cloud   Content   compuFng”   index   5/14/12   h(p://dar.bibalex.org   45  
  • 46.    Separate  indexes  Vs.  All  in  one     •  Separate  indexes   +  Indexing  Fme   +  Index  size   -­‐  Processing  results  (facets..)   -­‐  Scoring   5/14/12   h(p://dar.bibalex.org   46  
  • 47.    Separate  indexes  Vs.  All  in  one     •  Separate  indexes   +  Indexing  Fme   +  Index  size   -­‐  Processing  results  (facets..)   -­‐  Scoring   •  One  index   –  Index  size   –  Indexing  Fme   + Scoring   + Processing  Fme   5/14/12   h(p://dar.bibalex.org   47  
  • 48. Book  content  index   AR   EN     FR   IT   SP   Index   Index   Index   Index   Index   5/14/12   h(p://dar.bibalex.org   48  
  • 49. 5/14/12   h(p://dar.bibalex.org   49  
  • 50. Searching   •  Simple  and    advanced  search   –  Cache  the  resulted  IDs  only   •  HighlighFng  search  result   –  Get  the  full  search  result  and  highlight  per  page   result       5/14/12   h(p://dar.bibalex.org   50  
  • 51. Book  Content  Search   •  Search  using   –  Search  query   –  Book  ID   –  List  of  pages’  IDs   •  Highlights   •  AnnotaFons   –  Saved  currently  in  DB   5/14/12   h(p://dar.bibalex.org   51  
  • 52. FaceFng   •  Fixed  facet  fields     –  Category,  sub-­‐category,  language..etc.   –  Stored,  indexed,  exact  fields   •  Process  facets  from  different  indices   5/14/12   h(p://dar.bibalex.org   52  
  • 53. PersonalizaFon   •  Using  separate  index  of  personalizaFon     –  Different  Solr  fields  for  different  languages.   –  Search  across  all  fields.   •  Saving  in  both  Solr  and  DB   •  Indexing  tags,  raFng  and  comments  using  type   field     5/14/12   h(p://dar.bibalex.org   53  
  • 54. Future   •  Book  mobile  applicaFon  using  Solr   •  Using  Hadoop     •  Indexing  other  digital  media  (Maps,  audio,   video)   5/14/12   h(p://dar.bibalex.org   54  
  • 55. Contact         engy.morsy  @bibalex.org   Library  website:  h(p://bibalex.org   Digital  Asset  Repository:  h(p://dar.bibalex.org     5/14/12   h(p://dar.bibalex.org   55  
  • 56. 5/14/12   h(p://dar.bibalex.org   56  
  • 57. Thank  you…   5/14/12   h(p://dar.bibalex.org   57