Create your own search engine with Apache Solr

Alfonso Focareta
Angelo Quercioli
Creare il proprio motore di ricerca con Apache Solr

alfonso.focareta@pronetics.it (@afocareta) Pro-netics S.p.A.
angelo.quercioli@pronetics.it Pro-netics S.p.A

Solr & Lucene
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli Angelo.quercioli@pronetics.it

Lucene: features
Angelo Quercioli angelo.quercioli@pronetics.it

• High performance, full-text & scalable search library

• 100% pure Java

• Focus: Indexing + Searching Documents (“Document” is just a list of
name+value pairs)

• No crawlers or document parsing Flexible Text Analysis (tokenizers
+ token filters)

Solr: features

• A full text search server based on Lucene
• XML/HTTP, JSON Interfaces
• Faceted Search (category counting)
• Flexible data schema to define types and fields
• Hit Highlighting
• Configurable Advanced Caching
• Index Replication
• Extensible Open Architecture, Plugins
• Web Administration Interface
• Written in Java5, deployable as a WAR

Solr: license

OPEN SOURCE!!
Apache License

Solr: Architecture

Solr: Installing and Starting

• JDK5 or above intsalled

http://localhost:8983/solr/admin/ in your web browser for admin it

Solr: Define a schema.xml

Define a Schema (schema.xml)

The file schema.xml describes the structures of the data indexed.

• Type definitions
• Field definitions
• CopyField section
• Additional definitions

Solr: Define a schema.xml (type definition)

Type Definition

List of type and component (simple and complex)
• Primitive type
• WhiteSpaceTokenizerFactory
• StopFilterFactory
• WordDelimiterFilterFactory
• LowerCaseFilterFactory
• SnowBallFilterFactory (stemming)

Solr: Define a schema.xml (type definition- example)

Type Definition - Example

Solr: Define a schema.xml (type definition- example)

Field Definitions

• Field Attributes: name, type, indexed, stored, multiValued, omitNorms,
termVectors

<field name="id“ type="string" indexed="true" stored="true"/>
<field name="sku“ type="textTight” indexed="true" stored="true"/>
<field name="name“ type="text“ indexed="true" stored="true"/>
<field name=“inStock“ type=“boolean“ indexed="true“ stored=“false"/>
<field name=“price“ type=“sfloat“ indexed="true“ stored=“false"/>
<field name=“category“ type=”text_ws“ indexed=”true” stored=“true”
multiValued="true"/>

• Dynamic Fields

<dynamicField name="*_i" type="sint“ indexed="true" stored="true"/>
<dynamicField name="*_s" type="string“ indexed="true" stored="true"/>
<dynamicField name="*_t" type="text“ indexed="true" stored="true"/>

Solr: Define a schema.xml (Copy Field- example)

Copy Field
Copies one field to another at index time.
Case#1: Analyze same field different ways
– copy into a field with a different analyzer
– boost exact-case, exact-punctuation matches
– language translations, thesaurus, soundex

<field name=“title” type=“text”/>
<field name=“title_exact” type=“text_exact” stored=“false”/>
<copyField source=“title” dest=“title_exact”/>

Case #2: Index multiple fields into single searchable field

Solr: Indexing Method

Indexing Method

You put documents in it (called "indexing") via :
• XML
• JSON
• CSV
• Binary over http (multipart request)

Solr: Indexing (Java Api)

Indexing by Solrj

Send an xml like this
<add><doc
<field name=“id”>043564</field>
<field name=“name”>Alfonso</field>
<field name=“surname”>Focareta</field>
<field name=“category”>developer</field>
<field name=“language”>Italian</field>
<field name=“language”>English</field>
</doc></add>

Solr: Indexing (Solrj)

Solrj

Solrj is a java client to access solr, It offers a java interface to
add, update, and query the solr index

Example ->

Solr: Indexing (Solrj) Example

Solr: Delete Document

Delete document(s)

• Delete by Id(most efficient)
<delete>
<id>05591</id>
<id>32552</id>
</delete>

• Delete by Query
<delete>
<query>language:english</query>
</delete>

Solr: Commit and Optimize

Commit and Optimize

Commit : when you are indexing documents to Solr none
of the changes you are making will appear until you run
the commit command!

Optimize: the command that reorganize the index into
segments (increasing search speed) and remove any deleted
(replaced) documents.

Solr: Searching

Searching
You can search document in Solr by http or by solrj library.
http://localhost:8983/solr/select?q=language:italian&start=0&rows
=2&fl=name,surname

<response>
<result numFound=“15" start="0">
<doc>
<str name=“name">Angelo</str>
<str name=“surname”>quercioli</str>
</doc>
<doc>
<str name=“name">Alfonso</str>
<str name=“surname”>Focareta</str>
</doc>
</result>
</response>

Solr: Searching (Response Format)

Response Format
You can add &wt=json for JSON formatted response

{“result": {"numFound":15, "start":0,
"docs": [
{“name”:”Angelo”, “surname”:”Quercioli”},
{“name”:” Alfonso”, “surname”:” Focareta”}
]
}

Solr: Searching – Query Syntax

Lucene Query Syntax

• Italian english
Equiv: italian OR english
QueryParser default operator is “OR”/optional

• Wildcard searches: ang?o, alf*o, rom*

• +italian+english –name:angelo
Equiv: italian AND english NOT name:angelo
• “justice league” –name:aquaman
• releaseDate:[2012-01-01T00-00-00Z TO 2013-12-31T23:59:59Z]
• description:“legge roma”~100
•

Solr: Searching – Query Syntax 2

Lucene Query Syntax 2

• *:*
• (angelo AND “pier francesco”) OR
(+federico +paolo)

Solr: Function Query

Function Query
• Allows adding function of field value to score
– Boost recently added or popular documents
• Current parser only supports function notation
• Example: log(sum(popularity,1))
• sum, min, max, log, sqrt, currency, ms … etc
• scale(x, target_min, target_max)
– calculates min & max of x across all docs
• map(x, min, max, target)
– useful for dealing with defaults

Solr: Boosted Query

Boosted Query

• Score is multiplied instead of added
– New local params {!...} syntax added
&q={!boost b=sqrt(popularity)}”super man”

• Parameter dereferencing in local params
&q={!boost b=$boost v=$userq}
&boost=sqrt(popularity)
&userq=“super man”

Solr: Facet Query
Angelo Quercioli angelo.quercioli@pronetic.it

Facet Query
Faceted search breaks up search result into multiple categories

http://solr/select?q=foo&wt=json&indent=on
&facet=true&facet.field=cat
&facet.query=price:[0 TO 100]
&facet.query=manu:IBM

{"response":{"numFound":26,"start":0,"docs":[…]},
“facet_counts":{
"facet_queries":{
"price:[0 TO 100]":6,
“manu:IBM":2},
"facet_fields":{
"cat":[ "electronics",14, "memory",3,
"card",2, "connector",2]
}}}

Solr: Filter Query

Filter Query

• Filters are restrictions in addition to the query
• Use in faceting to narrow the results
• Filters are cached separately for speed

User queries for memory, query sent to solr is
&q=memory&fq=inStock:true&facet=true&…
2. User selects 1GB memory size
&q=memory&fq=inStock:true&fq=size:1GB&…
3. User selects DDR2 memory type
&q=memory&fq=inStock:true&fq=size:1GB
&fq=type:DDR2&…

Demo!

Demo!

Demo!

Questions ?

Create your own search engine with Apache Solr

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a Create your own search engine with Apache Solr

Semelhante a Create your own search engine with Apache Solr (20)

Create your own search engine with Apache Solr