SlideShare uma empresa Scribd logo
1 de 28
Alfonso Focareta
Angelo Quercioli
Creare il proprio motore di ricerca con Apache Solr


alfonso.focareta@pronetics.it (@afocareta) Pro-netics S.p.A.
angelo.quercioli@pronetics.it Pro-netics S.p.A
Solr & Lucene
Alfonso Focareta   alfonso.focareta@pronetics.it
Angelo Quercioli   Angelo.quercioli@pronetics.it
Lucene: features
                 Alfonso Focareta                           alfonso.focareta@pronetics.it
                 Angelo Quercioli                           angelo.quercioli@pronetics.it




•   High performance, full-text & scalable search library

• 100% pure Java

•   Focus: Indexing + Searching Documents (“Document” is just a list of
    name+value pairs)

•   No crawlers or document parsing Flexible Text Analysis (tokenizers
    + token filters)
Solr: features
                  Alfonso Focareta                    alfonso.focareta@pronetics.it
                  Angelo Quercioli                    angelo.quercioli@pronetics.it




•   A full text search server based on Lucene
•   XML/HTTP, JSON Interfaces
•   Faceted Search (category counting)
•   Flexible data schema to define types and fields
•   Hit Highlighting
•   Configurable Advanced Caching
•   Index Replication
•   Extensible Open Architecture, Plugins
•   Web Administration Interface
•   Written in Java5, deployable as a WAR
Solr: license
Alfonso Focareta                    alfonso.focareta@pronetics.it
Angelo Quercioli                    angelo.quercioli@pronetics.it




               OPEN SOURCE!!
                   Apache License
Solr: Architecture
Alfonso Focareta     alfonso.focareta@pronetics.it
Angelo Quercioli     angelo.quercioli@pronetics.it
Solr: Installing and Starting
               Alfonso Focareta                     alfonso.focareta@pronetics.it
               Angelo Quercioli                     angelo.quercioli@pronetics.it




• JDK5 or above intsalled




http://localhost:8983/solr/admin/ in your web browser for admin it
Solr: Define a schema.xml
                 Alfonso Focareta                         alfonso.focareta@pronetics.it
                 Angelo Quercioli                         angelo.quercioli@pronetics.it




             Define a Schema (schema.xml)

The file schema.xml describes the structures of the data indexed.



•   Type definitions
•   Field definitions
•   CopyField section
•   Additional definitions
Solr: Define a schema.xml (type definition)
             Alfonso Focareta                        alfonso.focareta@pronetics.it
             Angelo Quercioli                        angelo.quercioli@pronetics.it




                       Type Definition

List of type and component (simple and complex)
• Primitive type
• WhiteSpaceTokenizerFactory
• StopFilterFactory
• WordDelimiterFilterFactory
• LowerCaseFilterFactory
• SnowBallFilterFactory (stemming)
Solr: Define a schema.xml (type definition- example)
Alfonso Focareta                               alfonso.focareta@pronetics.it
Angelo Quercioli                               angelo.quercioli@pronetics.it




                   Type Definition - Example
Solr: Define a schema.xml (type definition- example)
                     Alfonso Focareta                              alfonso.focareta@pronetics.it
                     Angelo Quercioli                              angelo.quercioli@pronetics.it


                              Field Definitions

•   Field Attributes: name, type, indexed, stored, multiValued, omitNorms,
    termVectors

<field name="id“ type="string" indexed="true" stored="true"/>
<field name="sku“ type="textTight” indexed="true" stored="true"/>
<field name="name“      type="text“    indexed="true" stored="true"/>
<field name=“inStock“ type=“boolean“ indexed="true“ stored=“false"/>
<field name=“price“ type=“sfloat“ indexed="true“ stored=“false"/>
<field name=“category“ type=”text_ws“ indexed=”true” stored=“true”
    multiValued="true"/>

•   Dynamic Fields

<dynamicField name="*_i" type="sint“ indexed="true" stored="true"/>
<dynamicField name="*_s" type="string“ indexed="true" stored="true"/>
<dynamicField name="*_t" type="text“ indexed="true" stored="true"/>
Solr: Define a schema.xml (Copy Field- example)
                    Alfonso Focareta                          alfonso.focareta@pronetics.it
                    Angelo Quercioli                          angelo.quercioli@pronetics.it


                                  Copy Field
Copies one field to another at index time.
Case#1: Analyze same field different ways
   – copy into a field with a different analyzer
   – boost exact-case, exact-punctuation matches
   – language translations, thesaurus, soundex

<field name=“title” type=“text”/>
<field name=“title_exact” type=“text_exact” stored=“false”/>
<copyField source=“title” dest=“title_exact”/>

Case #2: Index multiple fields into single searchable field
Solr: Indexing Method
              Alfonso Focareta                      alfonso.focareta@pronetics.it
              Angelo Quercioli                      Angelo.quercioli@pronetics.it




                          Indexing Method

You put documents in it (called "indexing") via :
•   XML
•   JSON
•   CSV
•   Binary over http (multipart request)
Solr: Indexing (Java Api)
               Alfonso Focareta                 alfonso.focareta@pronetics.it
               Angelo Quercioli                 Angelo.quercioli@pronetics.it




                           Indexing by Solrj

Send an xml like this
    <add><doc
     <field name=“id”>043564</field>
     <field name=“name”>Alfonso</field>
     <field name=“surname”>Focareta</field>
     <field name=“category”>developer</field>
     <field name=“language”>Italian</field>
     <field name=“language”>English</field>
    </doc></add>
Solr: Indexing (Solrj)
                Alfonso Focareta                      alfonso.focareta@pronetics.it
                Angelo Quercioli                      angelo.quercioli@pronetics.it




                                      Solrj

Solrj is a java client to access solr, It offers a java interface to
  add, update, and query the solr index

                                   Example ->
Solr: Indexing (Solrj) Example
Alfonso Focareta                 alfonso.focareta@pronetics.it
Angelo Quercioli                 angelo.quercioli@pronetics.it
Solr: Delete Document
                Alfonso Focareta              alfonso.focareta@pronetics.it
                Angelo Quercioli              angelo.quercioli@pronetics.it



                         Delete document(s)

• Delete by Id(most efficient)
       <delete>
           <id>05591</id>
           <id>32552</id>
       </delete>

• Delete by Query
       <delete>
            <query>language:english</query>
       </delete>
Solr: Commit and Optimize
              Alfonso Focareta                 alfonso.focareta@pronetics.it
              Angelo Quercioli                 angelo.quercioli@pronetics.it



                    Commit and Optimize

Commit : when you are indexing documents to Solr none
  of the changes you are making will appear until you run
  the commit command!




Optimize: the command that reorganize the index into
segments (increasing search speed) and remove any deleted
(replaced) documents.
Solr: Searching
                Alfonso Focareta                    alfonso.focareta@pronetics.it
                Angelo Quercioli                    angelo.quercioli@pronetics.it



                     Searching
You can search document in Solr by http or by solrj library.
http://localhost:8983/solr/select?q=language:italian&start=0&rows
   =2&fl=name,surname

<response>
 <result numFound=“15" start="0">
  <doc>
   <str name=“name">Angelo</str>
   <str name=“surname”>quercioli</str>
  </doc>
  <doc>
   <str name=“name">Alfonso</str>
   <str name=“surname”>Focareta</str>
  </doc>
 </result>
</response>
Solr: Searching (Response Format)
             Alfonso Focareta                    alfonso.focareta@pronetics.it
             Angelo Quercioli                    angelo.quercioli@pronetics.it



                  Response Format
You can add &wt=json for JSON formatted response

{“result": {"numFound":15, "start":0,
  "docs": [
    {“name”:”Angelo”, “surname”:”Quercioli”},
    {“name”:” Alfonso”, “surname”:” Focareta”}
  ]
}
Solr: Searching – Query Syntax
                  Alfonso Focareta                  alfonso.focareta@pronetics.it
                  Angelo Quercioli                  angelo.quercioli@pronetics.it



                         Lucene Query Syntax

• Italian english
    Equiv: italian OR english
    QueryParser default operator is “OR”/optional

• Wildcard searches: ang?o, alf*o, rom*

• +italian+english –name:angelo
     Equiv: italian AND english NOT name:angelo
• “justice league” –name:aquaman
• releaseDate:[2012-01-01T00-00-00Z TO 2013-12-31T23:59:59Z]
• description:“legge roma”~100
                                 •
Solr: Searching – Query Syntax 2
          Alfonso Focareta                   alfonso.focareta@pronetics.it
          Angelo Quercioli                   angelo.quercioli@pronetics.it



               Lucene Query Syntax 2




• *:*
• (angelo AND “pier francesco”) OR
  (+federico +paolo)
Solr: Function Query
                Alfonso Focareta                 alfonso.focareta@pronetics.it
                Angelo Quercioli                 angelo.quercioli@pronetics.it



                        Function Query
•   Allows adding function of field value to score
     – Boost recently added or popular documents
•   Current parser only supports function notation
•   Example: log(sum(popularity,1))
•   sum, min, max, log, sqrt, currency, ms … etc
•   scale(x, target_min, target_max)
     – calculates min & max of x across all docs
•   map(x, min, max, target)
     – useful for dealing with defaults
Solr: Boosted Query
             Alfonso Focareta               alfonso.focareta@pronetics.it
             Angelo Quercioli               angelo.quercioli@pronetics.it



                           Boosted Query

• Score is multiplied instead of added
  – New local params {!...} syntax added
&q={!boost b=sqrt(popularity)}”super man”

• Parameter dereferencing in local params
&q={!boost b=$boost v=$userq}
&boost=sqrt(popularity)
&userq=“super man”
Solr: Facet Query
                    Alfonso Focareta                      alfonso.focareta@pronetics.it
                    Angelo Quercioli                       angelo.quercioli@pronetic.it


                                  Facet Query
Faceted search breaks up search result into multiple categories

http://solr/select?q=foo&wt=json&indent=on
 &facet=true&facet.field=cat
 &facet.query=price:[0 TO 100]
 &facet.query=manu:IBM

{"response":{"numFound":26,"start":0,"docs":[…]},
 “facet_counts":{
   "facet_queries":{
     "price:[0 TO 100]":6,
     “manu:IBM":2},
   "facet_fields":{
     "cat":[ "electronics",14, "memory",3,
           "card",2, "connector",2]
   }}}
Solr: Filter Query
                  Alfonso Focareta                    alfonso.focareta@pronetics.it
                  Angelo Quercioli                    angelo.quercioli@pronetics.it


                                     Filter Query

• Filters are restrictions in addition to the query
• Use in faceting to narrow the results
• Filters are cached separately for speed

User queries for memory, query sent to solr is
 &q=memory&fq=inStock:true&facet=true&…
2. User selects 1GB memory size
 &q=memory&fq=inStock:true&fq=size:1GB&…
3. User selects DDR2 memory type
 &q=memory&fq=inStock:true&fq=size:1GB
           &fq=type:DDR2&…
Demo!
Alfonso Focareta           alfonso.focareta@pronetics.it
Angelo Quercioli           angelo.quercioli@pronetics.it




                   Demo!
Demo!
Alfonso Focareta                 alfonso.focareta@pronetics.it
Angelo Quercioli                 angelo.quercioli@pronetics.it




                   Questions ?

Mais conteúdo relacionado

Destaque

Cloud for dummies easycloud
Cloud for dummies   easycloudCloud for dummies   easycloud
Cloud for dummies easycloudAlessandro Greco
 
Oracle COTS Applications on AWS
Oracle COTS Applications on AWSOracle COTS Applications on AWS
Oracle COTS Applications on AWSTom Laszewski
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceNeev Technologies
 
Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...
Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...
Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...SOASTA
 
The Future of Change Management and DevOps for Dummies
The Future of Change Management and DevOps for DummiesThe Future of Change Management and DevOps for Dummies
The Future of Change Management and DevOps for DummiesDBmaestro - Database DevOps
 
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...Giuseppe Paterno'
 
Hyper-V vs. vSphere: Understanding the Differences
Hyper-V vs. vSphere: Understanding the DifferencesHyper-V vs. vSphere: Understanding the Differences
Hyper-V vs. vSphere: Understanding the DifferencesSolarWinds
 
Choosing Public vs. Private vs. Hybrid Cloud Computing
Choosing Public vs. Private vs. Hybrid Cloud ComputingChoosing Public vs. Private vs. Hybrid Cloud Computing
Choosing Public vs. Private vs. Hybrid Cloud ComputingSkytap Cloud
 
Sparc SuperCluster
Sparc SuperClusterSparc SuperCluster
Sparc SuperClusterFran Navarro
 
Comparing JVM Web Frameworks - February 2014
Comparing JVM Web Frameworks - February 2014Comparing JVM Web Frameworks - February 2014
Comparing JVM Web Frameworks - February 2014Matt Raible
 
IT frameworks
IT frameworksIT frameworks
IT frameworkscyouss
 
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...Daniel Krook
 
EMC World 2016 - cnaITL.06 Containers are not Cloud Native
EMC World 2016 - cnaITL.06 Containers are not Cloud NativeEMC World 2016 - cnaITL.06 Containers are not Cloud Native
EMC World 2016 - cnaITL.06 Containers are not Cloud Native{code}
 
Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web appsDirecti Group
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTrilok Sharma
 
Agile Project Management with Scrum
Agile Project Management with ScrumAgile Project Management with Scrum
Agile Project Management with ScrumAditya Raj
 
Learn Scrum Engineering in 5 minutes
Learn Scrum Engineering in 5 minutesLearn Scrum Engineering in 5 minutes
Learn Scrum Engineering in 5 minutesguest035e0d
 
Infrastructure And Application Consolidation Analysis And Design
Infrastructure And Application Consolidation Analysis And DesignInfrastructure And Application Consolidation Analysis And Design
Infrastructure And Application Consolidation Analysis And DesignAlan McSweeney
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1Hassy Veldstra
 

Destaque (20)

Cloud for dummies easycloud
Cloud for dummies   easycloudCloud for dummies   easycloud
Cloud for dummies easycloud
 
Oracle COTS Applications on AWS
Oracle COTS Applications on AWSOracle COTS Applications on AWS
Oracle COTS Applications on AWS
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...
Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...
Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers ...
 
The Future of Change Management and DevOps for Dummies
The Future of Change Management and DevOps for DummiesThe Future of Change Management and DevOps for Dummies
The Future of Change Management and DevOps for Dummies
 
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
 
Hyper-V vs. vSphere: Understanding the Differences
Hyper-V vs. vSphere: Understanding the DifferencesHyper-V vs. vSphere: Understanding the Differences
Hyper-V vs. vSphere: Understanding the Differences
 
Choosing Public vs. Private vs. Hybrid Cloud Computing
Choosing Public vs. Private vs. Hybrid Cloud ComputingChoosing Public vs. Private vs. Hybrid Cloud Computing
Choosing Public vs. Private vs. Hybrid Cloud Computing
 
Sparc SuperCluster
Sparc SuperClusterSparc SuperCluster
Sparc SuperCluster
 
Comparing JVM Web Frameworks - February 2014
Comparing JVM Web Frameworks - February 2014Comparing JVM Web Frameworks - February 2014
Comparing JVM Web Frameworks - February 2014
 
IT frameworks
IT frameworksIT frameworks
IT frameworks
 
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
 
EMC World 2016 - cnaITL.06 Containers are not Cloud Native
EMC World 2016 - cnaITL.06 Containers are not Cloud NativeEMC World 2016 - cnaITL.06 Containers are not Cloud Native
EMC World 2016 - cnaITL.06 Containers are not Cloud Native
 
Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web apps
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
Agile Project Management with Scrum
Agile Project Management with ScrumAgile Project Management with Scrum
Agile Project Management with Scrum
 
Learn Scrum Engineering in 5 minutes
Learn Scrum Engineering in 5 minutesLearn Scrum Engineering in 5 minutes
Learn Scrum Engineering in 5 minutes
 
DevOps, beyond agile
DevOps, beyond agileDevOps, beyond agile
DevOps, beyond agile
 
Infrastructure And Application Consolidation Analysis And Design
Infrastructure And Application Consolidation Analysis And DesignInfrastructure And Application Consolidation Analysis And Design
Infrastructure And Application Consolidation Analysis And Design
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1
 

Semelhante a Create your own search engine with Apache Solr

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索longkeyy
 
Querying rich text with XQuery
Querying rich text with XQueryQuerying rich text with XQuery
Querying rich text with XQuerylucenerevolution
 
— Knock, knock — An async templates — Who’s there? - Alexander Khokhlov | ...
 — Knock, knock — An async templates — Who’s there? - Alexander Khokhlov  |  ... — Knock, knock — An async templates — Who’s there? - Alexander Khokhlov  |  ...
— Knock, knock — An async templates — Who’s there? - Alexander Khokhlov | ...Elixir Club
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
SolrMeter Lightning talk - Lucene Revolution 2010
SolrMeter   Lightning talk - Lucene Revolution 2010SolrMeter   Lightning talk - Lucene Revolution 2010
SolrMeter Lightning talk - Lucene Revolution 2010Tomás Fernández Löbbe
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSematext Group, Inc.
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdfAbanti Aazmin
 
Deep Dive: Structured XML Authoring with George Bina, oXygen XML Editor
Deep Dive: Structured XML Authoring with George Bina, oXygen XML EditorDeep Dive: Structured XML Authoring with George Bina, oXygen XML Editor
Deep Dive: Structured XML Authoring with George Bina, oXygen XML EditorScott Abel
 

Semelhante a Create your own search engine with Apache Solr (20)

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Apache solr
Apache solrApache solr
Apache solr
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索
 
Querying rich text with XQuery
Querying rich text with XQueryQuerying rich text with XQuery
Querying rich text with XQuery
 
— Knock, knock — An async templates — Who’s there? - Alexander Khokhlov | ...
 — Knock, knock — An async templates — Who’s there? - Alexander Khokhlov  |  ... — Knock, knock — An async templates — Who’s there? - Alexander Khokhlov  |  ...
— Knock, knock — An async templates — Who’s there? - Alexander Khokhlov | ...
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
SolrMeter Lightning talk - Lucene Revolution 2010
SolrMeter   Lightning talk - Lucene Revolution 2010SolrMeter   Lightning talk - Lucene Revolution 2010
SolrMeter Lightning talk - Lucene Revolution 2010
 
Solr
SolrSolr
Solr
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Deep Dive: Structured XML Authoring with George Bina, oXygen XML Editor
Deep Dive: Structured XML Authoring with George Bina, oXygen XML EditorDeep Dive: Structured XML Authoring with George Bina, oXygen XML Editor
Deep Dive: Structured XML Authoring with George Bina, oXygen XML Editor
 

Create your own search engine with Apache Solr

  • 1. Alfonso Focareta Angelo Quercioli Creare il proprio motore di ricerca con Apache Solr alfonso.focareta@pronetics.it (@afocareta) Pro-netics S.p.A. angelo.quercioli@pronetics.it Pro-netics S.p.A
  • 2. Solr & Lucene Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli Angelo.quercioli@pronetics.it
  • 3. Lucene: features Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it • High performance, full-text & scalable search library • 100% pure Java • Focus: Indexing + Searching Documents (“Document” is just a list of name+value pairs) • No crawlers or document parsing Flexible Text Analysis (tokenizers + token filters)
  • 4. Solr: features Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it • A full text search server based on Lucene • XML/HTTP, JSON Interfaces • Faceted Search (category counting) • Flexible data schema to define types and fields • Hit Highlighting • Configurable Advanced Caching • Index Replication • Extensible Open Architecture, Plugins • Web Administration Interface • Written in Java5, deployable as a WAR
  • 5. Solr: license Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it OPEN SOURCE!! Apache License
  • 6. Solr: Architecture Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it
  • 7. Solr: Installing and Starting Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it • JDK5 or above intsalled http://localhost:8983/solr/admin/ in your web browser for admin it
  • 8. Solr: Define a schema.xml Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Define a Schema (schema.xml) The file schema.xml describes the structures of the data indexed. • Type definitions • Field definitions • CopyField section • Additional definitions
  • 9. Solr: Define a schema.xml (type definition) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Type Definition List of type and component (simple and complex) • Primitive type • WhiteSpaceTokenizerFactory • StopFilterFactory • WordDelimiterFilterFactory • LowerCaseFilterFactory • SnowBallFilterFactory (stemming)
  • 10. Solr: Define a schema.xml (type definition- example) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Type Definition - Example
  • 11. Solr: Define a schema.xml (type definition- example) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Field Definitions • Field Attributes: name, type, indexed, stored, multiValued, omitNorms, termVectors <field name="id“ type="string" indexed="true" stored="true"/> <field name="sku“ type="textTight” indexed="true" stored="true"/> <field name="name“ type="text“ indexed="true" stored="true"/> <field name=“inStock“ type=“boolean“ indexed="true“ stored=“false"/> <field name=“price“ type=“sfloat“ indexed="true“ stored=“false"/> <field name=“category“ type=”text_ws“ indexed=”true” stored=“true” multiValued="true"/> • Dynamic Fields <dynamicField name="*_i" type="sint“ indexed="true" stored="true"/> <dynamicField name="*_s" type="string“ indexed="true" stored="true"/> <dynamicField name="*_t" type="text“ indexed="true" stored="true"/>
  • 12. Solr: Define a schema.xml (Copy Field- example) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Copy Field Copies one field to another at index time. Case#1: Analyze same field different ways – copy into a field with a different analyzer – boost exact-case, exact-punctuation matches – language translations, thesaurus, soundex <field name=“title” type=“text”/> <field name=“title_exact” type=“text_exact” stored=“false”/> <copyField source=“title” dest=“title_exact”/> Case #2: Index multiple fields into single searchable field
  • 13. Solr: Indexing Method Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli Angelo.quercioli@pronetics.it Indexing Method You put documents in it (called "indexing") via : • XML • JSON • CSV • Binary over http (multipart request)
  • 14. Solr: Indexing (Java Api) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli Angelo.quercioli@pronetics.it Indexing by Solrj Send an xml like this <add><doc <field name=“id”>043564</field> <field name=“name”>Alfonso</field> <field name=“surname”>Focareta</field> <field name=“category”>developer</field> <field name=“language”>Italian</field> <field name=“language”>English</field> </doc></add>
  • 15. Solr: Indexing (Solrj) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Solrj Solrj is a java client to access solr, It offers a java interface to add, update, and query the solr index Example ->
  • 16. Solr: Indexing (Solrj) Example Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it
  • 17. Solr: Delete Document Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Delete document(s) • Delete by Id(most efficient) <delete> <id>05591</id> <id>32552</id> </delete> • Delete by Query <delete> <query>language:english</query> </delete>
  • 18. Solr: Commit and Optimize Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Commit and Optimize Commit : when you are indexing documents to Solr none of the changes you are making will appear until you run the commit command! Optimize: the command that reorganize the index into segments (increasing search speed) and remove any deleted (replaced) documents.
  • 19. Solr: Searching Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Searching You can search document in Solr by http or by solrj library. http://localhost:8983/solr/select?q=language:italian&start=0&rows =2&fl=name,surname <response> <result numFound=“15" start="0"> <doc> <str name=“name">Angelo</str> <str name=“surname”>quercioli</str> </doc> <doc> <str name=“name">Alfonso</str> <str name=“surname”>Focareta</str> </doc> </result> </response>
  • 20. Solr: Searching (Response Format) Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Response Format You can add &wt=json for JSON formatted response {“result": {"numFound":15, "start":0, "docs": [ {“name”:”Angelo”, “surname”:”Quercioli”}, {“name”:” Alfonso”, “surname”:” Focareta”} ] }
  • 21. Solr: Searching – Query Syntax Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Lucene Query Syntax • Italian english Equiv: italian OR english QueryParser default operator is “OR”/optional • Wildcard searches: ang?o, alf*o, rom* • +italian+english –name:angelo Equiv: italian AND english NOT name:angelo • “justice league” –name:aquaman • releaseDate:[2012-01-01T00-00-00Z TO 2013-12-31T23:59:59Z] • description:“legge roma”~100 •
  • 22. Solr: Searching – Query Syntax 2 Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Lucene Query Syntax 2 • *:* • (angelo AND “pier francesco”) OR (+federico +paolo)
  • 23. Solr: Function Query Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Function Query • Allows adding function of field value to score – Boost recently added or popular documents • Current parser only supports function notation • Example: log(sum(popularity,1)) • sum, min, max, log, sqrt, currency, ms … etc • scale(x, target_min, target_max) – calculates min & max of x across all docs • map(x, min, max, target) – useful for dealing with defaults
  • 24. Solr: Boosted Query Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Boosted Query • Score is multiplied instead of added – New local params {!...} syntax added &q={!boost b=sqrt(popularity)}”super man” • Parameter dereferencing in local params &q={!boost b=$boost v=$userq} &boost=sqrt(popularity) &userq=“super man”
  • 25. Solr: Facet Query Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetic.it Facet Query Faceted search breaks up search result into multiple categories http://solr/select?q=foo&wt=json&indent=on &facet=true&facet.field=cat &facet.query=price:[0 TO 100] &facet.query=manu:IBM {"response":{"numFound":26,"start":0,"docs":[…]}, “facet_counts":{ "facet_queries":{ "price:[0 TO 100]":6, “manu:IBM":2}, "facet_fields":{ "cat":[ "electronics",14, "memory",3, "card",2, "connector",2] }}}
  • 26. Solr: Filter Query Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Filter Query • Filters are restrictions in addition to the query • Use in faceting to narrow the results • Filters are cached separately for speed User queries for memory, query sent to solr is &q=memory&fq=inStock:true&facet=true&… 2. User selects 1GB memory size &q=memory&fq=inStock:true&fq=size:1GB&… 3. User selects DDR2 memory type &q=memory&fq=inStock:true&fq=size:1GB &fq=type:DDR2&…
  • 27. Demo! Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Demo!
  • 28. Demo! Alfonso Focareta alfonso.focareta@pronetics.it Angelo Quercioli angelo.quercioli@pronetics.it Questions ?