This presentation is loosly based on my 2-day writeups on Lucene Revolution conference 2013 held in Dublin
http://dmitrykan.blogspot.fi/2013/11/lucene-revolution-eu-2013-in-dublin-day.html
http://dmitrykan.blogspot.fi/2013/11/lucene-revolution-eu-2013-in-dublin-day_13.html
3. Day 1 (Wednesday): conference
1. Keynote by Michael Busch of Twitter
2. Integrating Solr and Storm by Timothy
Potter
3. Lucene at LinkedIn by LI enggs
4. Additions to Lucene arsenal by Adrien
Grand and Shai Erera
4. Day 1: conference
5. Shrinking the Haystack with SOLR and
OpenNLP
6. Parboiled for query parser generating:
SWAN = SAME, WITHIN, ADJ, NEAR
7. Stump the Chump! ->
5. Stump the Chump
We use filter queries a lot. Some of these are long boolean queries. Of those some are static, i.e.
are not changing every day, but only sometimes. The example would be:
fq=Country:(Angora OR Russia OR US) // relatively small set of potentially grouppable entries (I.
e. group labels can be created to shorten a query).
The others are very dynamic, changing practically every day. The example would be:
fq=UserId:(userid1 OR userid9 OR...) // veeeery long boolean query, like thousands of
ungrouppable entries
If we don't cache the dynamic filter queries, we save space for useful filter queries, but slow down
the execution. If we do cache the dynamic filter queries we are risking the quick cache flushing.
Is there a smart way of handling such a situation?
Regards,
Dmitry Kan
!
U
ST
M
D
PE
7. Day 2 (Thursday)
1. Discussion panel with LucidWorks CEO
2. “Lucene Search Essentials: Scorers,
Collections and Custom Queries” by Mikhail
Khludnev
3. Text classification and Apache Mahout by
Isabel Drost
8. Day 2
4. Turning search upside down by Alan
Woodward and Charlie Hull
TOPIC_TAXONOMY
5. What is in Lucene index by Adrien Grand
9. Lucene @ Twitter
1. Just Lucene, no SOLR
2. Index in RAM (2 weeks)
3. Postings lists are sorted by time, such that
index reader reads from the end and gets
fresh data
4. No commits => no index reopening!
5. Keep promising code -> Apache