2. • 1997: Doug Cutting creates Lucene
• 2000-2001: SourceForge hosts Lucene
• 2001-present: Lucene @ Apache Software Foundation
• 2006: Flexible indexing planning starts
• 2007: Solr graduates from the Apache Incubator to join the Lucene PMC as a sub-project
• 2008: Flexible indexing implementation begins
• 2010: Lucene and Solr development merge
• 2011: Lucene and Solr 3.1 and all further releases coordinated (13 joint releases so far)
• 2012: Lucene/Solr 4.0 released
Some Lucene (& Solr) History & Stats
3. Lucene 4.0 Highlights
• Flexible indexing: pluggable codecs: index format suites
• Flexible scoring: more index stats & similarities that use them
• Faster multithreaded indexing via concurrent flushing: DWPT
• Doc Values: typed single-valued fields: flexible sorting, scoring
• Norms are now doc values: you can have more than one byte!
• More RAM efficient data structures, e.g. terms dict/idx & fieldcache
• Faster search filtering
• Merge I/O can be rate-limited, to reduce I/O contention
• IndexReader is now per-segment
• Completely reworked spatial search
4. Lucene 4.1 & 4.2 Highlights
• Seeks on writing out index files eliminated
• Compressed stored fields and term vectors
• AnalyzingSuggester and FuzzySuggester
• Lucene facet module improvements: speedups, NRT
support, DrillSideways
• PostingsHighlighter: uses postings offsets
• CommonTermsQuery: speed up queries with very highly
frequent terms.
• Doc Values API and performance improvements
• The FST package supports FSTs over 2GB in size
• LiveFieldValues: real-time get for Lucene
• New classification module
5. Lucene 4.3 Highlights
• minShouldMatch BooleanQuery major performance
improvement
• SortingAtomicReader and SortingMergePolicy
• DocIdSetIterator and Scorer now has a cost API
• Analyzing/FuzzySuggester now enable recording an
arbitrary byte[] as a payload
• Spatial module: support for query relations Within,
Contains, and Disjoint
• Facet module: new method computes facet counts
using SortedSetDocValuesField, without a separate
taxonomy index.
6. On the horizon
• More efficient positional queries
• Incremental field updates
• Korean Analyzer
8. Solr Developer/User survey, April 2013
• Survey invitation emailed to 4,136 people:
– LucidWorks training class attendees
– Revolution attendees
– LucidWorks webinar registrants
• 177 have responded so far
9. Please rank the following features by priority
Answered: 165 Skipped: 12
10.
11.
12.
13.
14.
15. More questions
1. How many attendees are Eclipse developers?
2. How many attendees are running Solr Cloud
in production?
17. Origins of Solr
• CNET driven to find alternatives to discontinued
commercial enterprise search product
• Plan A: ATOMICS (Apache TO MySQL In CNET
Search)
– Standalone server speaking XML over HTTP
– Meet majority of “search” needs
– http://conferences.oreillynet.com/cs/mysqluc2005/view/e_sess/7066
• Plan B: “Something based on Lucene”
– Started Summer 2004
– First prototype called “Fusion”, later renamed SOLAR
(Search On Lucene And Resin)
25. Seamless Online Shard Splitting
Shard2_0
Shard1
replic
a
leader
Shard2
replic
a
leader
Shard3
replic
a
leader
Shard2_1
1. New sub-shards created in “construction” state
2. Leader starts forwarding applicable updates, which
are buffered by the sub-shards
3. Leader index is split and installed on the sub-shards
4. Sub-shards apply buffered updates then become
“active” leaders and old shard becomes “inactive”
update
26. Cloud Enhancements
• Request forwarding
– In a multi-collection cluster, any node can
handle/forward requests for any collection
• Collection Aliases
http://localhost:8983/solr/admin/collections
?action=CREATEALIAS
&name=northeast
&collections=NY,NJ,PA,CT,ME,MA,NH,RI,VT
• Coming Soon: Shard Aliases
27. Schema REST API
• Restlet is now integrated with Solr
• Get a specific field
curl http://localhost:8983/solr/schema/fields/price
{"field":{
"name":"price",
"type":"float",
"indexed":true,
"stored":true }}
• Get all fields
curl http://localhost:8983/solr/schema/fields
• Get Entire Schema!
curl http://localhost:8983/solr/schema
28. Dynamic Schema
• Add a new field (Solr 4.4)
curl -XPUT http://localhost:8983/solr/schema/fields/strength -d ‘
{"type":”float", "indexed":"true”}
‘
• Works in distributed (cloud) mode too!
• Future: More schemaless
– Reality: there is no such thing for Lucene based systems
– Type guessing for fields we haven’t seen before
29. Future
• Greater scalability
• More “NoSQL”
– More ways to update & manipulate documents
• Analytics
– More powerful faceting, functions, statistics
• Improved Relational queries
• More dynamic (settings & configuration)
• Continued focus on ease of use