2. Apache solr is a search server written in Java
using the java search library “lucene”.
Open source
Get results using web service as JSON/XML
UTF-8 support
3. Ebay
Hp
Guardian
Cisco
At&t
Intoit
Ford
http://wiki.apache.org/solr/PublicServers
4. Text based library in Java
Fast , feature rich with active apache
development community
Inverted Index mechanism - Index the
content related to the terms/words
5. Server
Solr 4.3.0
Java server containers ( Tomcat/Jetty Servers )
Java 1.6 and above
Client
Any system which can post and get data through
http
7. Schema – can consider as a db table
Core - schema container
Collection – multiple core handling
DIH - Data import handler
Request handler - StandardRequestHandler ,
DisMaxRequestHandler (multiple fields),
IndexInfoRequestHandler
Response handler - xml , json , python,ruby
8. Start Solr
java -jar start.jar
This will start up t he Jetty application server on port 8983, and use
your terminal to display the logging information from Solr.
Index your data
java -jar post.jar *.xml
Interface
http://localhost:8983/solr
10. The Solr Home directory typically contains the following sub-directories...
conf/
This directory is mandatory and must contain your solrconfig.xml
and schema.xml. Any other optional configuration files would also
be kept here.
data/
This directory is the default location where Solr will keep your
index, and is used by the replication scripts for dealing with
snapshots. You can override this location in the
conf/solrconfig.xml. Solr will create this directory if it does not
already exist.
lib/
This directory is optional. If it exists, Solr will load any Jars
found in this directory and use them to resolve any "plugins"
specified in your solrconfig.xml or schema.xml (ie: Analyzers,
Request Handlers, etc...). Alternatively you can use the <lib>
syntax in conf/solrconfig.xml to direct Solr to your plugins. See
the example conf/solrconfig.xml file for details.
13. add/update - allows you to add or update a document to Solr.
Additions and updates are not available for searching until a commit
takes place.
commit - tells Solr that all changes made since the last commit
should be made available for searching.
optimize - restructures Lucene's files to improve performance for
searching. Optimization is generally good to do when indexing has
completed. If there are frequent updates, you should schedule
optimization for low-usage times. An index does not need to be
optimized to work properly. Optimization can be a time-consuming
process.
delete - can be specified by id or by query. Delete by id deletes
the document with the specified id; delete by query deletes all
documents returned by a query.
14. Supported formats XML, JSON, CSV, or javabin.
Supported document types are Microsoft office docs, PDF’s
curl http://localhost:8983/solr/collection1/update/csv -H Content-
type:text/csv; charset=utf-8 --data-binary @D:/Projects/solr-
4.3.0/example/exampledocs/books.csv
http://localhost:8983/solr/collection1/update?stream.body=%3Cco
mmit/%3E
15. q The query to search with in Solr. See "Lucene QueryParser Syntax"
in Resources for a full description of the syntax. Sorting
information can be included by appending a semi-colon and the
name of an indexed, non-tokenized field (explained below). The
default sort is score desc, which means sort by descending score.
q=myField:Java AND
otherField:developerWorks; date asc
This query searches the two fields
specified and sorts the results based
on a date field.
start Specifies the starting offset into the result set. Useful for paging
through results. The default value is 0.
start=15
Returns results starting with the
fifteenth ranked result.
rows The maximum number of documents to return. The default value
is 10.
rows=25
fq Provide an optional filtering query. Results of the query are
restricted to searching only those results returned by the filter
query. Filtered queries are cached by Solr. They are very useful
for improving the speed of complex queries.
Any valid query that could be passed
in the q parameter, not including
sort information.
hl When hl=true, highlight snippets in the query response. Default
is false. See the Solr Wiki section on highlighting parameters for
more options (in Resources).
hl=true
fl Specify as a comma-separated list the set of Fields that should
be returned in the document results. "*" is the default and means
all fields. "score" indicates the score should be returned as well.
*,score
16. Full text search
http://localhost:8983/solr/select?q=Searchtext
Search only within a field
http://localhost:8983/solr/select?q=fieldname:searchtext
Control which fields are displayed in result
http://localhost:8983/solr/select?q=video&fl=id,category
Provide ranges to fields
http://localhost:8983/solr/select?q=price:[0
TO400]&fl=id,name,price
More like this (MLT)
http://localhost:8983/solr/select?q=Searchtext&mlt=true&mlt.fl=he
adline&mlt.mindf=1&mlt.mintf=1&fl=id,score&rows=100
More information on how this works and the options available
can be found at http://wiki.apache.org/solr/MoreLikeThis
20. Removing Data from Index
curl http://localhost:8983/solr/collection1/update -H
"Content-Type: text/xml“ --data-binary
“<delete><query>*:*</query></delete>”