SlideShare uma empresa Scribd logo
1 de 60
Baixar para ler offline
What's New in Solr?


        code4lib 2011 preconference
               Bloomington, IN
presented by Erik Hatcher of Lucid Imagination
about me
spoken at several code4lib conferences

   Keynoted Athens '07 along with the pioneering Solr preconference,

   Providence '09, "Rising Sun"

   pre-conferenced Asheville '10, "Solr Black Belt"

co-authored "Lucene in Action", first edition; ghost/toast on second edition

Lucene and Solr committer.

library world claims to fame are founding and naming Blacklight, original developer on
Collex and the Rossetti Archive search

now at Lucid Imagination, dedicated to Lucene/Solr support/services/training/etc
abstract
 The library world is fired up about Solr. Practically every
next-gen catalog is using it (via Blacklight, VuFind, or other
    technologies). Solr has continued improving in some
     dramatic ways, including geospatial support, field
collapsing/grouping, extended dismax query parsing, pivot/
   grid/matrix/tree faceting, autosuggest, and more. This
 session will cover all of these new features, showcasing
 live examples of them all, including anything new that is
           implemented prior to the conference.
LIA2 - Lucene in Action
Published: July 2010 - http://www.manning.com/lucene/
New in this second edition:
   Performing hot backups
   Using numeric fields
   Tuning for indexing or searching speed
   Boosting matches with payloads
   Creating reusable analyzers
   Adding concurrency with threads
   Four new case studies, and more
Version Number
Which one ya talking 'bout, Willis?

  3.1? 4.0?? TRUNK??

playing with fire

  index format changes to be expected

     reindexing recommended/required

Solr/Lucene merged development codebases

  releases should occur lock-step moving forward
dependencies

November 2009: Solr 1.4 (Lucene 2.9.1)

June 2010: Solr 1.4.1 (Lucene 2.9.3)

Spring 2011(?): Solr 3.1 (Lucene 3.1)

TRUNK: Solr 4.x (Lucene TRUNK)
lucene
per-segment field cache, etc

Unicode and analysis improvements throughout

Analysis "attributes"

AutomatonQuery: RegexpQuery, WildcardQuery

flexible indexing

and so much more!
README

Reindex!

Upgrade SolrJ libraries too (javabin format
changed)

Read Lucene and Solr's CHANGES.txt files for all
the details
Analysis

UAX, using ICU

CollationKey

PatternReplaceCharFilter

KeywordMarkerFilterFactory,
StemmerOverrideFilterFactory
Standard tokenization

ClassicTokenizer: old StandardTokenizer

StandardTokenizer: now uses Unicode text
segmentation specified by UAX#29

UAX29URLEmailTokenizer

maxTokenLength: default=255
PathHierarchyTokenizer


delimiter: default=/

replace: default=<delimiter>

"/foo/bar" => [/foo] [/foo/bar]
CollationKeyFilter
A filter that lets one specify:

   A system collator associated with a locale, or

   A collator based on custom rules

This can be used for changing sort order for non-english languages as well as
to modify the collation sequence for certain languages. You must use the same
CollationKeyFilter at both index-time and query-time for correct results. Also,
the JVM vendor, version (including patch version) of the slave should be exactly
same as the master (or indexer) for consistent results.

http://wiki.apache.org/solr/UnicodeCollation

see also: ICUCollationKeyFilter
ICU
International Components for Unicode

ICUFoldingFilter

ICUNormalizer2Filter

  name=nfc|nfkc|nfkc_cf

  mode=compose|decompose

  filter
ICUFoldingFilter
Accent removal, case folding,canonical duplicates folding,dashes
folding,diacritic removal (including stroke, hook, descender), Greek letterforms
folding, Han Radical folding, Hebrew Alternates folding, Jamo folding,
Letterforms folding, Math symbol folding, Multigraph Expansions: All, Native
digit folding, No-break folding, Overline folding, Positional forms folding, Small
forms folding, Space folding, Spacing Accents folding, Subscript folding,
Superscript folding, Suzhou Numeral folding, Symbol folding, Underline folding,
Vertical forms folding, Width folding

Additionally, Default Ignorables are removed, and text is normalized to NFKC.

 All foldings, case folding, and normalization mappings are applied recursively
to ensure a fully folded and normalized result.
ICUTransformFilter
id: specific transliterator identifier from ICU's
Transliterator#getAvailableIDs()(required)

direction=forward|reverse

Examples:

  Traditional-Simplified:         =>

  Cyrillic-Latin: Российская Федерация =>
  Rossijskaâ Federaciâ
Tom Burton-West's
latest

ICU

shingles

query parser

ABC -> [A] [B] [C] or [AB] [BC]...
highlighter


deprecated old config, now config as standard
search component

FastVectorHighlighter
FastVectorHighlighter

if termVectors="true", termPositions="true", and
termOffsets="true"

and hl.useFastVectorHighlighter=true

  hl.fragListBuilder

  hl.fragmentsBuilder
spatial
JTeam's plugin: packaged for easy deployment

Solr trunk capabilities

many distance functions

What's missing?

  geo faceting? scoring by distance? distance
  pseudo-field?

All units in kilometers, unless otherwise specified
Spatial field types

Point: n-dimensional, must specify dimension
(default=2), represented by N subfields internally

LatLon: latitude,longitude, represented by two
subfields internally, single valued only

GeoHash: single string representation of lat/lon
Spatial query parsers
geofilt: exact filtering

bbox: uses (trie) range queries

Parameters:

  sfield: spatial field

  pt: reference point

  d: distance
field collapsing/grouping
backwards compatibility mode?               sort: how to sort groups, by top
                                            document in each group
http://wiki.apache.org/solr/
FieldCollapsing                             group.sort: how to sort docs within
                                            each group
group=true
                                            group.format: grouped | simple
group.field / group.func / group.query
                                            group.main=true|false:
rows / start: for groups, not documents
                                            faceting works as normal
group.limit: number of results per
group                                       not distributed savvy yet

group.offset: offset into doclist of each
group
query parsing


TextField: autoGeneratePhraseQueries="true"

  if single string analyzes to multiple tokens
{!raw|term|field f=$f}...
Recall why we needed {!raw} from last year

<fieldType = .../> - use one string, one numeric, (and one text?)

<field name="..."/>

table for numeric and for string (and text?):

    {!raw f=$f} | TermQuery(...)

    {!term f=$f} | ...

    {!field f=$f} | ...

Which to use when? {!raw} works for strings just fine, but best to migrate to the generally
safer/wiser {!term} for future-proofing.
{!term f=field}


fq={!term f=weight}1.5
dismax

q.op or schema.xml's <solrQueryParser
defaultOperator="[AND|OR]"/> defaults mm to 0%
(OR) or 100% (AND)

#code4lib: issues with non-analyzed fields in qf
edismax
Supports full lucene query syntax in the absence of syntax errors


supports "and"/"or" to mean "AND"/"OR" in lucene syntax mode


When there are syntax errors, improved smart partial escaping of special characters is done to prevent
them... in this mode, fielded queries, +/-, and phrase queries are still supported.


Improved proximity boosting via word bigrams... this prevents the problem of needing 100% of the words in
the document to get any boost, as well as having all of the words in a single field.


advanced stopword handling... stopwords are not required in the mandatory part of the query but are still
used (if indexed) in the proximity boosting part. If a query consists of all stopwords (e.g. to be or not to be)
then all will be required.


Supports the "boost" parameter.. like the dismax bf param, but multiplies the function query instead of
adding it in


Supports pure negative nested queries... so a query like +foo (-foo) will match all documents
function queries


termfreq, tf, docfreq, idf, norm, maxdoc, numdocs

{!func}termfreq(text,ipod)

standard java.util.Math functions
faceting
per-segment, single-valued fields:

   facet.method=fcs (field cache per segment)

   facet.field={!threads=-1}field_name

       threads=0: direct execution

       threads=-1: thread per segment

   speeds up single and multivalued method=fc, especially for deep paging with
   facet.offset

date faceting improvements, generalized for numeric ranges too

can now exclude main query q={!tag=main}the+query&facet.field={!ex=main}category
pivot/grid/matrix/tree
faceting


is this also "hierarchical faceting"? it depends!
pivot faceting output
/select?q=*:*&rows=0&facet=on
&facet.pivot=cat,popularity,inStock
&facet.pivot=popularity,cat
spell checking


DirectSolrSpellChecker

  no external index needed, uses automaton on
  main index
spellcheck config
solrconfig.xml
  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

    <str name="queryAnalyzerFieldType">textgen</str>

    <!-- a spellchecker that uses no auxiliary index -->
     <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">spell</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
      <str name="minPrefix">1</str>
    </lst>
  </searchComponent>
spellcheck handler

solrconfig.xml
  <requestHandler name="standard" class="solr.SearchHandler" default="true">
    <!-- default values for query parameters -->
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <str name="spellcheck">true</str>
       <str name="spellcheck.collate">true</str>
     </lst>

    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>
spellcheck response
http://localhost:8983/solr/select?q=ipud%20bluck&wt=ruby&indent=on
                        {
                            'responseHeader'=>{
                               'status'=>0,
                               'QTime'=>10,
                               'params'=>{
                                 'indent'=>'on',
                                 'wt'=>'ruby',
                                 'q'=>'ipud bluck'}},
                            'response'=>{'numFound'=>0,'start'=>0,'docs'=>[]
                            },
                            'spellcheck'=>{
                               'suggestions'=>[
                                 'ipud',{
                                   'numFound'=>1,
                                   'startOffset'=>0,
                                   'endOffset'=>4,
                                   'suggestion'=>['ipod']},
                                 'bluck',{
                                   'numFound'=>1,
                                   'startOffset'=>5,
                                   'endOffset'=>10,
                                   'suggestion'=>['black']},
                                 'collation','ipod black']}}
autosuggest

new "spellcheck" component, builds TST

collates query

can check if collated suggestions yield results,
optionally, providing hit count
suggest config
solrconfig.xml
  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

    <str name="queryAnalyzerFieldType">textgen</str>


    <lst name="spellchecker">
      <str name="name">suggest</str>
      <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
      <str name="lookupImpl">
        org.apache.solr.spelling.suggest.jaspell.JaspellLookup
      </str>
      <str name="field">suggest</str>
      <str name="buildOnCommit">true</str>
    </lst>
  </searchComponent>

schema.xml
   <field name="suggest" type="textgen" indexed="true" stored="false"/>

   <copyField source="name" dest="suggest"/>
suggest handler
solrconfig.xml
  <requestHandler class="solr.SearchHandler" name="/suggest">
    <lst name="defaults">
      <str name="spellcheck">true</str>
      <str name="spellcheck.dictionary">suggest</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.count">10</str>
      <str name="rows">0</str>
      <str name="spellcheck.maxCollationTries">20</str>
      <str name="spellcheck.maxCollations">10</str>
      <str name="spellcheck.collateExtendedResults">true</str>
    </lst>
    <arr name="components">
      <str>query</str> <!-- to allow suggestion hit counts to be returned -->
      <str>spellcheck</str>
    </arr>
  </requestHandler>
suggest response
http://localhost:8983/solr/suggest?q=ip&wt=ruby&indent=on
         {
             'responseHeader'=>{
                'status'=>0,
                'QTime'=>2},
             'response'=>{'numFound'=>0,'start'=>0,'docs'=>[]
             },
             'spellcheck'=>{
                'suggestions'=>[
                  'ip',{
                    'numFound'=>1,
                    'startOffset'=>0,
                    'endOffset'=>2,
                    'suggestion'=>['ipod']},
                  'collation',[
                    'collationQuery','ipod',
                    'hits',3,
                    'misspellingsAndCorrections',[
                      'ip','ipod']]]}}
sort

by function

  &q=*:*&sfield=store&pt=39.194564,-86.432947&
  sort=geodist() asc

but still can't get value of function back

  unless you force it to be the score somehow
clustering component


now works out-of-the-box; all Apache license
compatible

supports distributed search
debug=true


debug=true|all|timing|query|results

debug=results&debug.explain.structured=true
structured explain
http://localhost:8983/solr/select?q=title:solr
&debug.explain.structured=true&debug=results
&wt=ruby&indent=on
     'debug'=>{
       'explain'=>{
         'doc1'=>{
           'match'=>true,
           'value'=>0.076713204,
           'description'=>'fieldWeight(title:solr in 0), product of:',
           'details'=>[{
                'match'=>true,
                'value'=>1.0,
                'description'=>'tf(termFreq(title:solr)=1)'},
             {
                'match'=>true,
                'value'=>0.30685282,
                'description'=>'idf(docFreq=1, maxDocs=1)'},
             {
                'match'=>true,
                'value'=>0.25,
                'description'=>'fieldNorm(field=title, doc=0)'}]}}}}
SolrCloud

shared/central config and core/shard managment
via zookeeper,

built-in load balancing, and infrastructure for future
SolrCloud work.
/update/json
solrconfig.xml
  <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler"/>




     curl
         'http://localhost:8983/solr/update/json?commit=true'
         -H 'Content-type:application/json' -d '
     {
       "add": {
         "doc": {
           "id" : "MyTestDocument",
           "title" : "This is just a test"
         }
       }
     }'
wt=csv

Writes only docs (no response header or response
extras) in CSV format

Roundtrippable with /update/csv

  provided all fields are stored
UIMA
Unstructured Information Management
Architecture

 http://uima.apache.org/

New update processor chain, augmenting
incoming documents from a UIMA annotator
pipeline

 http://wiki.apache.org/solr/SolrUIMA
(solr|lucene)-dev


ant [idea|eclipse]

go!

http://wiki.apache.org/solr/HowToContribute
works in progress

some interesting open issues (with patches):

  PayloadTermQuery

  XMLQueryParser plugin

  join
{!join from=$f to=$t}


insert <what Yonik said>

  https://issues.apache.org/jira/browse/
  SOLR-2272
Lucid (imagination)
What's Lucid done for you lately -

  Yonik, Mark, Grant, Hoss: Lucene and Solr performance,
  faceting, grouping, join query, spatial, Mahout, ORP, PMC,
  etc, etc, etc

  Other technical staff involved in mailing list assistance, bug
  reporting, contributing patches (hi Lance, Erick, Jay, Tom,
  Grijesh, Tomas....)

  extended dismax, join, faceting performance improvements

  LucidWorks Enterprise
Hoss Simplicity

http://www.lucidimagination.com/blog/
2011/01/21/solr-powered-isfdb-part1/

http://www.lucidimagination.com/blog/
2011/01/28/solr-powered-isfdb-part-2/
LucidWorks Enterprise
      "lucid" query parser               REST API

      click boosting                     Data sources,
                                         crawlers, and
      tunable norms, per-                scheduling
      field
                                         Alerts
      role filtering

      administrative UI

http://www.lucidimagination.com/enterprise-search-solutions/lucidworks
Community Questions


fire away!
resources


duh!: #code4lib

lucene.apache.org/solr

search.lucidimagination.com/?q=<your query>
Q&A: faceting


why is paging through facets the way it is?

  short-circuits on enum
Community:
- The state of Extended DisMax, and what Lucene features
remain incompatible with it.

- Any developments on faceting (I've implemented the
standard workaround to the "unknown facet list size"
problem...  but I'd still love to be able to know exactly how
long the lists are)

- Hierarchical documents in Solr -- I haven't followed the
conversations closely, but I gather that this topic is gaining
some momentum in the Solr community.
contact info
erik.hatcher @ lucidimagination . com

http://www.lucidimagination.com

  webinars, documentation

  LucidFind: search.lucidimagination.com

    search mailing list posts, wiki pages, web
    sites, our blog, etc for latest Lucene/Solr
    assistance
re: code4lib

Mais conteúdo relacionado

Mais procurados

Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHPPaul Borgermans
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEcommerce Solution Provider SysIQ
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 

Mais procurados (20)

Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferay
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 

Semelhante a code4lib 2011 preconference: What's New in Solr (since 1.4.1)

SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)Thomas Francart
 
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board ExamsC++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board Examshishamrizvi
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational BiologyAtreyiB
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD VivaAidan Hogan
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UNLucidworks
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLucidworks
 
Diving into Functional Programming
Diving into Functional ProgrammingDiving into Functional Programming
Diving into Functional ProgrammingLev Walkin
 
Лев Валкин — Программируем функционально
Лев Валкин — Программируем функциональноЛев Валкин — Программируем функционально
Лев Валкин — Программируем функциональноDaria Oreshkina
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparationKushaal Singla
 
Clojure beasts-euroclj-2014
Clojure beasts-euroclj-2014Clojure beasts-euroclj-2014
Clojure beasts-euroclj-2014Renzo Borgatti
 
Ti1220 Lecture 7: Polymorphism
Ti1220 Lecture 7: PolymorphismTi1220 Lecture 7: Polymorphism
Ti1220 Lecture 7: PolymorphismEelco Visser
 
Article link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxArticle link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxfredharris32
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environmentsJ'tong Atong
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Martin Odersky
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Introduction to clojure
Introduction to clojureIntroduction to clojure
Introduction to clojureAbbas Raza
 

Semelhante a code4lib 2011 preconference: What's New in Solr (since 1.4.1) (20)

SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)SPARQL introduction and training (130+ slides with exercices)
SPARQL introduction and training (130+ slides with exercices)
 
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board ExamsC++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
C++ Notes by Hisham Ahmed Rizvi for Class 12th Board Exams
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
 
Diving into Functional Programming
Diving into Functional ProgrammingDiving into Functional Programming
Diving into Functional Programming
 
Лев Валкин — Программируем функционально
Лев Валкин — Программируем функциональноЛев Валкин — Программируем функционально
Лев Валкин — Программируем функционально
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
 
Clojure beasts-euroclj-2014
Clojure beasts-euroclj-2014Clojure beasts-euroclj-2014
Clojure beasts-euroclj-2014
 
Phpconf2008 Sphinx En
Phpconf2008 Sphinx EnPhpconf2008 Sphinx En
Phpconf2008 Sphinx En
 
Ti1220 Lecture 7: Polymorphism
Ti1220 Lecture 7: PolymorphismTi1220 Lecture 7: Polymorphism
Ti1220 Lecture 7: Polymorphism
 
Ruby quick ref
Ruby quick refRuby quick ref
Ruby quick ref
 
Dynamic Python
Dynamic PythonDynamic Python
Dynamic Python
 
Article link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxArticle link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docx
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environments
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Introduction to clojure
Introduction to clojureIntroduction to clojure
Introduction to clojure
 

Mais de Erik Hatcher

Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered LibrariesErik Hatcher
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and TricksErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Erik Hatcher
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrSolr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrErik Hatcher
 

Mais de Erik Hatcher (14)

Ted Talk
Ted TalkTed Talk
Ted Talk
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrSolr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache Solr
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

code4lib 2011 preconference: What's New in Solr (since 1.4.1)

  • 1. What's New in Solr? code4lib 2011 preconference Bloomington, IN presented by Erik Hatcher of Lucid Imagination
  • 2. about me spoken at several code4lib conferences Keynoted Athens '07 along with the pioneering Solr preconference, Providence '09, "Rising Sun" pre-conferenced Asheville '10, "Solr Black Belt" co-authored "Lucene in Action", first edition; ghost/toast on second edition Lucene and Solr committer. library world claims to fame are founding and naming Blacklight, original developer on Collex and the Rossetti Archive search now at Lucid Imagination, dedicated to Lucene/Solr support/services/training/etc
  • 3. abstract The library world is fired up about Solr. Practically every next-gen catalog is using it (via Blacklight, VuFind, or other technologies). Solr has continued improving in some dramatic ways, including geospatial support, field collapsing/grouping, extended dismax query parsing, pivot/ grid/matrix/tree faceting, autosuggest, and more. This session will cover all of these new features, showcasing live examples of them all, including anything new that is implemented prior to the conference.
  • 4. LIA2 - Lucene in Action Published: July 2010 - http://www.manning.com/lucene/ New in this second edition: Performing hot backups Using numeric fields Tuning for indexing or searching speed Boosting matches with payloads Creating reusable analyzers Adding concurrency with threads Four new case studies, and more
  • 5. Version Number Which one ya talking 'bout, Willis? 3.1? 4.0?? TRUNK?? playing with fire index format changes to be expected reindexing recommended/required Solr/Lucene merged development codebases releases should occur lock-step moving forward
  • 6. dependencies November 2009: Solr 1.4 (Lucene 2.9.1) June 2010: Solr 1.4.1 (Lucene 2.9.3) Spring 2011(?): Solr 3.1 (Lucene 3.1) TRUNK: Solr 4.x (Lucene TRUNK)
  • 7. lucene per-segment field cache, etc Unicode and analysis improvements throughout Analysis "attributes" AutomatonQuery: RegexpQuery, WildcardQuery flexible indexing and so much more!
  • 8. README Reindex! Upgrade SolrJ libraries too (javabin format changed) Read Lucene and Solr's CHANGES.txt files for all the details
  • 10. Standard tokenization ClassicTokenizer: old StandardTokenizer StandardTokenizer: now uses Unicode text segmentation specified by UAX#29 UAX29URLEmailTokenizer maxTokenLength: default=255
  • 12. CollationKeyFilter A filter that lets one specify: A system collator associated with a locale, or A collator based on custom rules This can be used for changing sort order for non-english languages as well as to modify the collation sequence for certain languages. You must use the same CollationKeyFilter at both index-time and query-time for correct results. Also, the JVM vendor, version (including patch version) of the slave should be exactly same as the master (or indexer) for consistent results. http://wiki.apache.org/solr/UnicodeCollation see also: ICUCollationKeyFilter
  • 13. ICU International Components for Unicode ICUFoldingFilter ICUNormalizer2Filter name=nfc|nfkc|nfkc_cf mode=compose|decompose filter
  • 14. ICUFoldingFilter Accent removal, case folding,canonical duplicates folding,dashes folding,diacritic removal (including stroke, hook, descender), Greek letterforms folding, Han Radical folding, Hebrew Alternates folding, Jamo folding, Letterforms folding, Math symbol folding, Multigraph Expansions: All, Native digit folding, No-break folding, Overline folding, Positional forms folding, Small forms folding, Space folding, Spacing Accents folding, Subscript folding, Superscript folding, Suzhou Numeral folding, Symbol folding, Underline folding, Vertical forms folding, Width folding Additionally, Default Ignorables are removed, and text is normalized to NFKC. All foldings, case folding, and normalization mappings are applied recursively to ensure a fully folded and normalized result.
  • 15. ICUTransformFilter id: specific transliterator identifier from ICU's Transliterator#getAvailableIDs()(required) direction=forward|reverse Examples: Traditional-Simplified: => Cyrillic-Latin: Российская Федерация => Rossijskaâ Federaciâ
  • 17. highlighter deprecated old config, now config as standard search component FastVectorHighlighter
  • 18. FastVectorHighlighter if termVectors="true", termPositions="true", and termOffsets="true" and hl.useFastVectorHighlighter=true hl.fragListBuilder hl.fragmentsBuilder
  • 19. spatial JTeam's plugin: packaged for easy deployment Solr trunk capabilities many distance functions What's missing? geo faceting? scoring by distance? distance pseudo-field? All units in kilometers, unless otherwise specified
  • 20. Spatial field types Point: n-dimensional, must specify dimension (default=2), represented by N subfields internally LatLon: latitude,longitude, represented by two subfields internally, single valued only GeoHash: single string representation of lat/lon
  • 21. Spatial query parsers geofilt: exact filtering bbox: uses (trie) range queries Parameters: sfield: spatial field pt: reference point d: distance
  • 22. field collapsing/grouping backwards compatibility mode? sort: how to sort groups, by top document in each group http://wiki.apache.org/solr/ FieldCollapsing group.sort: how to sort docs within each group group=true group.format: grouped | simple group.field / group.func / group.query group.main=true|false: rows / start: for groups, not documents faceting works as normal group.limit: number of results per group not distributed savvy yet group.offset: offset into doclist of each group
  • 23. query parsing TextField: autoGeneratePhraseQueries="true" if single string analyzes to multiple tokens
  • 24. {!raw|term|field f=$f}... Recall why we needed {!raw} from last year <fieldType = .../> - use one string, one numeric, (and one text?) <field name="..."/> table for numeric and for string (and text?): {!raw f=$f} | TermQuery(...) {!term f=$f} | ... {!field f=$f} | ... Which to use when? {!raw} works for strings just fine, but best to migrate to the generally safer/wiser {!term} for future-proofing.
  • 26. dismax q.op or schema.xml's <solrQueryParser defaultOperator="[AND|OR]"/> defaults mm to 0% (OR) or 100% (AND) #code4lib: issues with non-analyzed fields in qf
  • 27. edismax Supports full lucene query syntax in the absence of syntax errors supports "and"/"or" to mean "AND"/"OR" in lucene syntax mode When there are syntax errors, improved smart partial escaping of special characters is done to prevent them... in this mode, fielded queries, +/-, and phrase queries are still supported. Improved proximity boosting via word bigrams... this prevents the problem of needing 100% of the words in the document to get any boost, as well as having all of the words in a single field. advanced stopword handling... stopwords are not required in the mandatory part of the query but are still used (if indexed) in the proximity boosting part. If a query consists of all stopwords (e.g. to be or not to be) then all will be required. Supports the "boost" parameter.. like the dismax bf param, but multiplies the function query instead of adding it in Supports pure negative nested queries... so a query like +foo (-foo) will match all documents
  • 28. function queries termfreq, tf, docfreq, idf, norm, maxdoc, numdocs {!func}termfreq(text,ipod) standard java.util.Math functions
  • 29. faceting per-segment, single-valued fields: facet.method=fcs (field cache per segment) facet.field={!threads=-1}field_name threads=0: direct execution threads=-1: thread per segment speeds up single and multivalued method=fc, especially for deep paging with facet.offset date faceting improvements, generalized for numeric ranges too can now exclude main query q={!tag=main}the+query&facet.field={!ex=main}category
  • 30. pivot/grid/matrix/tree faceting is this also "hierarchical faceting"? it depends!
  • 32. spell checking DirectSolrSpellChecker no external index needed, uses automaton on main index
  • 33. spellcheck config solrconfig.xml <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">textgen</str> <!-- a spellchecker that uses no auxiliary index --> <lst name="spellchecker"> <str name="name">default</str> <str name="field">spell</str> <str name="classname">solr.DirectSolrSpellChecker</str> <str name="minPrefix">1</str> </lst> </searchComponent>
  • 34. spellcheck handler solrconfig.xml <requestHandler name="standard" class="solr.SearchHandler" default="true"> <!-- default values for query parameters --> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="spellcheck">true</str> <str name="spellcheck.collate">true</str> </lst> <arr name="last-components"> <str>spellcheck</str> </arr> </requestHandler>
  • 35. spellcheck response http://localhost:8983/solr/select?q=ipud%20bluck&wt=ruby&indent=on { 'responseHeader'=>{ 'status'=>0, 'QTime'=>10, 'params'=>{ 'indent'=>'on', 'wt'=>'ruby', 'q'=>'ipud bluck'}}, 'response'=>{'numFound'=>0,'start'=>0,'docs'=>[] }, 'spellcheck'=>{ 'suggestions'=>[ 'ipud',{ 'numFound'=>1, 'startOffset'=>0, 'endOffset'=>4, 'suggestion'=>['ipod']}, 'bluck',{ 'numFound'=>1, 'startOffset'=>5, 'endOffset'=>10, 'suggestion'=>['black']}, 'collation','ipod black']}}
  • 36. autosuggest new "spellcheck" component, builds TST collates query can check if collated suggestions yield results, optionally, providing hit count
  • 37. suggest config solrconfig.xml <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">textgen</str> <lst name="spellchecker"> <str name="name">suggest</str> <str name="classname">org.apache.solr.spelling.suggest.Suggester</str> <str name="lookupImpl"> org.apache.solr.spelling.suggest.jaspell.JaspellLookup </str> <str name="field">suggest</str> <str name="buildOnCommit">true</str> </lst> </searchComponent> schema.xml <field name="suggest" type="textgen" indexed="true" stored="false"/> <copyField source="name" dest="suggest"/>
  • 38. suggest handler solrconfig.xml <requestHandler class="solr.SearchHandler" name="/suggest"> <lst name="defaults"> <str name="spellcheck">true</str> <str name="spellcheck.dictionary">suggest</str> <str name="spellcheck.collate">true</str> <str name="spellcheck.count">10</str> <str name="rows">0</str> <str name="spellcheck.maxCollationTries">20</str> <str name="spellcheck.maxCollations">10</str> <str name="spellcheck.collateExtendedResults">true</str> </lst> <arr name="components"> <str>query</str> <!-- to allow suggestion hit counts to be returned --> <str>spellcheck</str> </arr> </requestHandler>
  • 39. suggest response http://localhost:8983/solr/suggest?q=ip&wt=ruby&indent=on { 'responseHeader'=>{ 'status'=>0, 'QTime'=>2}, 'response'=>{'numFound'=>0,'start'=>0,'docs'=>[] }, 'spellcheck'=>{ 'suggestions'=>[ 'ip',{ 'numFound'=>1, 'startOffset'=>0, 'endOffset'=>2, 'suggestion'=>['ipod']}, 'collation',[ 'collationQuery','ipod', 'hits',3, 'misspellingsAndCorrections',[ 'ip','ipod']]]}}
  • 40. sort by function &q=*:*&sfield=store&pt=39.194564,-86.432947& sort=geodist() asc but still can't get value of function back unless you force it to be the score somehow
  • 41. clustering component now works out-of-the-box; all Apache license compatible supports distributed search
  • 43. structured explain http://localhost:8983/solr/select?q=title:solr &debug.explain.structured=true&debug=results &wt=ruby&indent=on 'debug'=>{ 'explain'=>{ 'doc1'=>{ 'match'=>true, 'value'=>0.076713204, 'description'=>'fieldWeight(title:solr in 0), product of:', 'details'=>[{ 'match'=>true, 'value'=>1.0, 'description'=>'tf(termFreq(title:solr)=1)'}, { 'match'=>true, 'value'=>0.30685282, 'description'=>'idf(docFreq=1, maxDocs=1)'}, { 'match'=>true, 'value'=>0.25, 'description'=>'fieldNorm(field=title, doc=0)'}]}}}}
  • 44. SolrCloud shared/central config and core/shard managment via zookeeper, built-in load balancing, and infrastructure for future SolrCloud work.
  • 45. /update/json solrconfig.xml <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler"/> curl 'http://localhost:8983/solr/update/json?commit=true' -H 'Content-type:application/json' -d ' { "add": { "doc": { "id" : "MyTestDocument", "title" : "This is just a test" } } }'
  • 46. wt=csv Writes only docs (no response header or response extras) in CSV format Roundtrippable with /update/csv provided all fields are stored
  • 47. UIMA Unstructured Information Management Architecture http://uima.apache.org/ New update processor chain, augmenting incoming documents from a UIMA annotator pipeline http://wiki.apache.org/solr/SolrUIMA
  • 49. works in progress some interesting open issues (with patches): PayloadTermQuery XMLQueryParser plugin join
  • 50. {!join from=$f to=$t} insert <what Yonik said> https://issues.apache.org/jira/browse/ SOLR-2272
  • 51. Lucid (imagination) What's Lucid done for you lately - Yonik, Mark, Grant, Hoss: Lucene and Solr performance, faceting, grouping, join query, spatial, Mahout, ORP, PMC, etc, etc, etc Other technical staff involved in mailing list assistance, bug reporting, contributing patches (hi Lance, Erick, Jay, Tom, Grijesh, Tomas....) extended dismax, join, faceting performance improvements LucidWorks Enterprise
  • 53. LucidWorks Enterprise "lucid" query parser REST API click boosting Data sources, crawlers, and tunable norms, per- scheduling field Alerts role filtering administrative UI http://www.lucidimagination.com/enterprise-search-solutions/lucidworks
  • 56. Q&A: faceting why is paging through facets the way it is? short-circuits on enum
  • 57. Community: - The state of Extended DisMax, and what Lucene features remain incompatible with it. - Any developments on faceting (I've implemented the standard workaround to the "unknown facet list size" problem...  but I'd still love to be able to know exactly how long the lists are) - Hierarchical documents in Solr -- I haven't followed the conversations closely, but I gather that this topic is gaining some momentum in the Solr community.
  • 58. contact info erik.hatcher @ lucidimagination . com http://www.lucidimagination.com webinars, documentation LucidFind: search.lucidimagination.com search mailing list posts, wiki pages, web sites, our blog, etc for latest Lucene/Solr assistance
  • 59.