Deep Dive: Structured XML Authoring with George Bina, oXygen XML Editor
Create your own search engine with Apache Solr
1. Alfonso Focareta
Angelo Quercioli
Creare il proprio motore di ricerca con Apache Solr
alfonso.focareta@pronetics.it (@afocareta) Pro-netics S.p.A.
angelo.quercioli@pronetics.it Pro-netics S.p.A
2. Solr & Lucene
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli Angelo.quercioli@pronetics.it
3. Lucene: features
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
• High performance, full-text & scalable search library
• 100% pure Java
• Focus: Indexing + Searching Documents (“Document” is just a list of
name+value pairs)
• No crawlers or document parsing Flexible Text Analysis (tokenizers
+ token filters)
4. Solr: features
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
• A full text search server based on Lucene
• XML/HTTP, JSON Interfaces
• Faceted Search (category counting)
• Flexible data schema to define types and fields
• Hit Highlighting
• Configurable Advanced Caching
• Index Replication
• Extensible Open Architecture, Plugins
• Web Administration Interface
• Written in Java5, deployable as a WAR
5. Solr: license
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
OPEN SOURCE!!
Apache License
7. Solr: Installing and Starting
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
• JDK5 or above intsalled
http://localhost:8983/solr/admin/ in your web browser for admin it
8. Solr: Define a schema.xml
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Define a Schema (schema.xml)
The file schema.xml describes the structures of the data indexed.
• Type definitions
• Field definitions
• CopyField section
• Additional definitions
9. Solr: Define a schema.xml (type definition)
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Type Definition
List of type and component (simple and complex)
• Primitive type
• WhiteSpaceTokenizerFactory
• StopFilterFactory
• WordDelimiterFilterFactory
• LowerCaseFilterFactory
• SnowBallFilterFactory (stemming)
10. Solr: Define a schema.xml (type definition- example)
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Type Definition - Example
12. Solr: Define a schema.xml (Copy Field- example)
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Copy Field
Copies one field to another at index time.
Case#1: Analyze same field different ways
– copy into a field with a different analyzer
– boost exact-case, exact-punctuation matches
– language translations, thesaurus, soundex
<field name=“title” type=“text”/>
<field name=“title_exact” type=“text_exact” stored=“false”/>
<copyField source=“title” dest=“title_exact”/>
Case #2: Index multiple fields into single searchable field
13. Solr: Indexing Method
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli Angelo.quercioli@pronetics.it
Indexing Method
You put documents in it (called "indexing") via :
• XML
• JSON
• CSV
• Binary over http (multipart request)
14. Solr: Indexing (Java Api)
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli Angelo.quercioli@pronetics.it
Indexing by Solrj
Send an xml like this
<add><doc
<field name=“id”>043564</field>
<field name=“name”>Alfonso</field>
<field name=“surname”>Focareta</field>
<field name=“category”>developer</field>
<field name=“language”>Italian</field>
<field name=“language”>English</field>
</doc></add>
15. Solr: Indexing (Solrj)
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Solrj
Solrj is a java client to access solr, It offers a java interface to
add, update, and query the solr index
Example ->
16. Solr: Indexing (Solrj) Example
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
17. Solr: Delete Document
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Delete document(s)
• Delete by Id(most efficient)
<delete>
<id>05591</id>
<id>32552</id>
</delete>
• Delete by Query
<delete>
<query>language:english</query>
</delete>
18. Solr: Commit and Optimize
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Commit and Optimize
Commit : when you are indexing documents to Solr none
of the changes you are making will appear until you run
the commit command!
Optimize: the command that reorganize the index into
segments (increasing search speed) and remove any deleted
(replaced) documents.
19. Solr: Searching
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Searching
You can search document in Solr by http or by solrj library.
http://localhost:8983/solr/select?q=language:italian&start=0&rows
=2&fl=name,surname
<response>
<result numFound=“15" start="0">
<doc>
<str name=“name">Angelo</str>
<str name=“surname”>quercioli</str>
</doc>
<doc>
<str name=“name">Alfonso</str>
<str name=“surname”>Focareta</str>
</doc>
</result>
</response>
20. Solr: Searching (Response Format)
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Response Format
You can add &wt=json for JSON formatted response
{“result": {"numFound":15, "start":0,
"docs": [
{“name”:”Angelo”, “surname”:”Quercioli”},
{“name”:” Alfonso”, “surname”:” Focareta”}
]
}
21. Solr: Searching – Query Syntax
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Lucene Query Syntax
• Italian english
Equiv: italian OR english
QueryParser default operator is “OR”/optional
• Wildcard searches: ang?o, alf*o, rom*
• +italian+english –name:angelo
Equiv: italian AND english NOT name:angelo
• “justice league” –name:aquaman
• releaseDate:[2012-01-01T00-00-00Z TO 2013-12-31T23:59:59Z]
• description:“legge roma”~100
•
22. Solr: Searching – Query Syntax 2
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Lucene Query Syntax 2
• *:*
• (angelo AND “pier francesco”) OR
(+federico +paolo)
23. Solr: Function Query
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Function Query
• Allows adding function of field value to score
– Boost recently added or popular documents
• Current parser only supports function notation
• Example: log(sum(popularity,1))
• sum, min, max, log, sqrt, currency, ms … etc
• scale(x, target_min, target_max)
– calculates min & max of x across all docs
• map(x, min, max, target)
– useful for dealing with defaults
24. Solr: Boosted Query
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Boosted Query
• Score is multiplied instead of added
– New local params {!...} syntax added
&q={!boost b=sqrt(popularity)}”super man”
• Parameter dereferencing in local params
&q={!boost b=$boost v=$userq}
&boost=sqrt(popularity)
&userq=“super man”
25. Solr: Facet Query
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetic.it
Facet Query
Faceted search breaks up search result into multiple categories
http://solr/select?q=foo&wt=json&indent=on
&facet=true&facet.field=cat
&facet.query=price:[0 TO 100]
&facet.query=manu:IBM
{"response":{"numFound":26,"start":0,"docs":[…]},
“facet_counts":{
"facet_queries":{
"price:[0 TO 100]":6,
“manu:IBM":2},
"facet_fields":{
"cat":[ "electronics",14, "memory",3,
"card",2, "connector",2]
}}}
26. Solr: Filter Query
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Filter Query
• Filters are restrictions in addition to the query
• Use in faceting to narrow the results
• Filters are cached separately for speed
User queries for memory, query sent to solr is
&q=memory&fq=inStock:true&facet=true&…
2. User selects 1GB memory size
&q=memory&fq=inStock:true&fq=size:1GB&…
3. User selects DDR2 memory type
&q=memory&fq=inStock:true&fq=size:1GB
&fq=type:DDR2&…
27. Demo!
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Demo!
28. Demo!
Alfonso Focareta alfonso.focareta@pronetics.it
Angelo Quercioli angelo.quercioli@pronetics.it
Questions ?