SlideShare uma empresa Scribd logo
1 de 61
Baixar para ler offline
Lucene for Solr
  Developers
  NFJS - Boston, September 2011
     Presented by Erik Hatcher
erik.hatcher@lucidimagination.com
         Lucid Imagination
 http://www.lucidimagination.com
About me...
• Co-author, "Lucene in Action" (and "Java
  Development with Ant" / "Ant in Action"
  once upon a time)
• "Apache guy" - Lucene/Solr committer;
  member of Lucene PMC, member of
  Apache Software Foundation
• Co-founder, evangelist, trainer, coder @
  Lucid Imagination
About Lucid Imagination...
•   Lucid Imagination provides commercial-grade
    support, training, high-level consulting and value-
    added software for Lucene and Solr.

•   We make Lucene ‘enterprise-ready’ by offering:

    •   Free, certified, distributions and downloads.

    •   Support, training, and consulting.

    •   LucidWorks Enterprise, a commercial search
        platform built on top of Solr.
What is Lucene?
•   An open source search library (not an application)

•   100% Java

•   Continuously improved and tuned over more than
    10 years

•   Compact, portable index representation

•   Programmable text analyzers, spell checking and
    highlighting

•   Not a crawler or a text extraction tool
Inverted Index
•   Lucene stores input data in what is known as an
    inverted index

•   In an inverted index each indexed term points to a
    list of documents that contain the term

•   Similar to the index provided at the end of a book

•   In this case "inverted" simply means the list of terms
    point to documents

•   It is much faster to find a term in an index, than to
    scan all the documents
Inverted Index Example
Segments and Merging
•   A Lucene index is a collection of one or more sub-indexes
    called segments

•   Each segment is a fully independent index

•   A multi-way merge algorithm is used to periodically merge
    segments

•   New segments are created when an IndexWriter flushes new
    documents and pending deletes to disk

•   Trying for a balance between large-scale performance vs. small-
    scale updates

•   Optimization merges all segments into one
Segments and Merging
Segments
• When a document is deleted it still exists
  in an index segment until that segment is
  merged
• At certain trigger points, these Documents
  are flushed to the Directory
• Can be forced by calling commit
• Segments are periodically merged
IndexSearcher
Adding new documents
Commit
Committed and
  Warmed
Lucene Scoring

•   Lucene uses a similarity scoring formula to rank results by measuring the
    similarity between a query and the documents that match the query. The
    factors that form the scoring formula are:

    •   Term Frequency: tf (t in d). How often the term occurs in the document.

    •   Inverse Document Frequency: idf (t). A measure of how rare the term is in
        the whole collection. One over the number of times the term appears in
        the collection.

    •   Terms that are rare throughout the entire collection score higher.
Coord and Norms
•   Coord: The coordination factor, coord (q, d).
    Boosts documents that match more of the
    search terms than other documents.
    •   If 4 of 4 terms match coord = 4/4
    •   If 3 of 4 terms match coord = 3/4
•   Length Normalization - Adjust the score based
    on length of fields in the document.
    •   shorter fields that match get a boost
Scoring Factors (cont)
• Boost: (t.field in d). A way to boost a field
  or a whole document above others.
• Query Norm: (q). Normalization value
  for a query, given the sum of the squared
  weights of each of the query terms.
• You will often hear the Lucene scoring
  simply referred to as
  TF·IDF.
Explanation

      • Lucene has a feature called Explanation
      • Solr uses the debugQuery parameter to
         retrieve scoring explanations

0.2987913 =   (MATCH) fieldWeight(text:lucen in 688), product of:
  1.4142135   = tf(termFreq(text:lucen)=2)
  9.014501    = idf(docFreq=3, maxDocs=12098)
  0.0234375   = fieldNorm(field=text, doc=688)
Lucene Core
• IndexWriter
• Directory
• IndexReader, IndexSearcher
• analysis: Analyzer, TokenStream,
  Tokenizer,TokenFilter
• Query
Solr Architecture
Customizing - Don't do it!

•   Unless you need to.
•   In other words... ensure you've given the built-in
    capabilities a try, asked on the e-mail list, and
    spelunked into at least Solr's code a bit to make
    some sense of the situation.
•   But we're here to roll up our sleeves, because we
    need to...
But first...
•   Look at Lucene and/or Solr source code as
    appropriate

•   Carefully read javadocs and wiki pages - lots of tips
    there

•   And, hey, search for what you're trying to do...

    •   Google, of course

    •   But try out LucidFind and other Lucene ecosystem
        specific search systems -
        http://www.lucidimagination.com/search/
Extension points
•   Tokenizer, TokenFilter,   •   QParser
    CharFilter
                              •   DataImportHandler
•   SearchComponent               hooks

•   RequestHandler                •   data sources

•   ResponseWriter                •   entity processors

•   FieldType                     •   transformers

•   Similarity                •   several others
Factories
• FooFactory (most) everywhere.
  Sometimes there's BarPlugin style

• for sake of discussion... let's just skip the
  "factory" part
• In Solr, Factories and Plugins are used by
  configuration loading to parameterize and
  construct
"Installing" plugins
• Compile .java to .class, JAR it up
• Put JAR files in either:
 • <solr-home>/lib
 • a shared lib when using multicore
 • anywhere, and register location in
    solrconfig.xml
• Hook in plugins as appropriate
Multicore sharedLib

<solr sharedLib="/usr/local/solr/customlib"
       persistent="true">
   <cores adminPath="/admin/cores">
      <core instanceDir="core1" name="core1"/>
      <core instanceDir="core2" name="core2"/>
   </cores>
</solr>
Plugins via
        solrconfig.xml


• <lib dir="/path/to/your/custom/jars" />
Analysis

• CharFilter
• Tokenizer
• TokenFilter
Primer

• Tokens, Terms
• Attributes: Type, Payloads, Offsets,
  Positions, Term Vectors
• part of the picture:
Version

• enum:
 • Version.LUCENE_31,
    Version.LUCENE_32, etc
• Version.onOrAfter(Version other)
CharFilter
• extend BaseCharFilter
• enables pre-tokenization filtering/morphing
  of incoming field value
• only affects tokenization, not stored value
• Built-in CharFilters: HTMLStripCharFilter,
  PatternReplaceCharFilter, and
  MappingCharFilter
Tokenizer
•   common to extend CharTokenizer

•   implement -

    •   protected abstract boolean isTokenChar(int c);

•   optionally override -

    •   protected int normalize(int c)

•   extend Tokenizer directly for finer control

•   Popular built-in Tokenizers include: WhitespaceTokenizer,
    StandardTokenizer, PatternTokenizer, KeywordTokenizer,
    ICUTokenizer
TokenFilter

• a TokenStream whose input is another
  TokenStream
• Popular TokenFilters include:
  LowerCaseFilter, CommonGramsFilter,
  SnowballFilter, StopFilter,
  WordDelimiterFilter
Lucene's analysis APIs
• tricky business, what with Attributes
  (Source/Factory's), State, characters, code
  points,Version, etc...
• Test!!!
 • BaseTokenStreamTestCase
 • Look at Lucene and Solr's test cases
Solr's Analysis Tools

• Admin analysis tool
• Field analysis request handler
• DEMO
Query Parsing


• String -> org.apache.lucene.search.Query
QParserPlugin
public abstract class QParserPlugin
    implements NamedListInitializedPlugin {

    public abstract QParser createParser(
      String qstr,
      SolrParams localParams,
      SolrParams params,
      SolrQueryRequest req);
}
QParser
public abstract class QParser {

    public abstract Query parse()
              throws ParseException;

}
Built-in QParsers
from QParserPlugin.java
  /** internal use - name to class mappings of builtin parsers */
  public static final Object[] standardPlugins = {
     LuceneQParserPlugin.NAME, LuceneQParserPlugin.class,
     OldLuceneQParserPlugin.NAME, OldLuceneQParserPlugin.class,
     FunctionQParserPlugin.NAME, FunctionQParserPlugin.class,
     PrefixQParserPlugin.NAME, PrefixQParserPlugin.class,
     BoostQParserPlugin.NAME, BoostQParserPlugin.class,
     DisMaxQParserPlugin.NAME, DisMaxQParserPlugin.class,
     ExtendedDismaxQParserPlugin.NAME, ExtendedDismaxQParserPlugin.class,
     FieldQParserPlugin.NAME, FieldQParserPlugin.class,
     RawQParserPlugin.NAME, RawQParserPlugin.class,
     TermQParserPlugin.NAME, TermQParserPlugin.class,
     NestedQParserPlugin.NAME, NestedQParserPlugin.class,
     FunctionRangeQParserPlugin.NAME, FunctionRangeQParserPlugin.class,
     SpatialFilterQParserPlugin.NAME, SpatialFilterQParserPlugin.class,
     SpatialBoxQParserPlugin.NAME, SpatialBoxQParserPlugin.class,
     JoinQParserPlugin.NAME, JoinQParserPlugin.class,
  };
Local Parameters

• {!qparser_name param=value}expression
 • or
• {!qparser_name param=value v=expression}
• Can substitute $references from request
  parameters
Param Substitution
solrconfig.xml
<requestHandler name="/document"
                class="solr.SearchHandler">
  <lst name="invariants">
    <str name="q">{!term f=id v=$id}</str>
  </lst>
</requestHandler>

Solr request
http://localhost:8983/solr/document?id=FOO37
Custom QParser

• Implement a QParserPlugin that creates your
  custom QParser
• Register in solrconfig.xml
 • <queryParser name="myparser"
    class="com.mycompany.MyQParserPlugin"/>
Update Processor

• Responsible for handling these commands:
 • add/update
 • delete
 • commit
 • merge indexes
Built-in Update
            Processors
•   RunUpdateProcessor
    •   Actually performs the operations, such as
        adding the documents to the index
•   LogUpdateProcessor
    •   Logs each operation
•   SignatureUpdateProcessor
    •   duplicate detection and optionally rejection
UIMA Update
           Processor
•   UIMA - Unstructured Information Management
    Architecture - http://uima.apache.org/

•   Enables UIMA components to augment
    documents

•   Entity extraction, automated categorization,
    language detection, etc

•   "contrib" plugin

•   http://wiki.apache.org/solr/SolrUIMA
Update Processor
         Chain
• UpdateProcessor's sequence into a chain
• Each processor can abort the entire update
  or hand processing to next processor in
  the chain
• Chains, of update processor factories, are
  specified in solrconfig.xml
• Update requests can specify an
  update.processor parameter
Default update
            processor chain
From SolrCore.java
// construct the default chain
UpdateRequestProcessorFactory[] factories =
  new UpdateRequestProcessorFactory[]{
     new RunUpdateProcessorFactory(),
     new LogUpdateProcessorFactory()
  };

    Note: these steps have been swapped on trunk recently
Example Update
           Processor
•   What are the best facets to show for a particular
    query? Wouldn't it be nice to see the distribution of
    document "attributes" represented across a result
    set?

•   Learned this trick from the Smithsonian, who were
    doing it manually - add an indexed field containing the
    field names of the interesting other fields on the
    document.

•   Facet on that field "of field names" initially, then
    request facets on the top values returned.
Config for custom
           update processor
<updateRequestProcessorChain name="fields_used" default="true">
 <processor class="solr.processor.FieldsUsedUpdateProcessorFactory">
  <str name="fieldsUsedFieldName">attribute_fields</str>
  <str name="fieldNameRegex">.*_attribute</str>
 </processor>
 <processor class="solr.LogUpdateProcessorFactory" />
 <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
FieldsUsedUpdateProcessorFactory


public class FieldsUsedUpdateProcessorFactory extends UpdateRequestProcessorFactory {
 private String fieldsUsedFieldName;
 private Pattern fieldNamePattern;

    public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp,
                                                                  UpdateRequestProcessor next) {
      return new FieldsUsedUpdateProcessor(req, rsp, this, next);
    }

    // ... next slide ...

}
FieldsUsedUpdateProcessorFactory
 @Override
 public void init(NamedList args) {
  if (args == null) return;

     SolrParams params = SolrParams.toSolrParams(args);

     fieldsUsedFieldName = params.get("fieldsUsedFieldName");
     if (fieldsUsedFieldName == null) {
       throw new SolrException
          (SolrException.ErrorCode.SERVER_ERROR,
             "fieldsUsedFieldName must be specified");
     }

     // TODO check that fieldsUsedFieldName is a valid field name and multiValued

     String fieldNameRegex = params.get("fieldNameRegex");
     if (fieldNameRegex == null) {
       throw new SolrException
          (SolrException.ErrorCode.SERVER_ERROR,
             "fieldNameRegex must be specified");
     }
     fieldNamePattern = Pattern.compile(fieldNameRegex);

     super.init(args);
 }
class FieldsUsedUpdateProcessor extends UpdateRequestProcessor {
  public FieldsUsedUpdateProcessor(SolrQueryRequest req,
                                   SolrQueryResponse rsp,
                                   FieldsUsedUpdateProcessorFactory factory,
                                   UpdateRequestProcessor next) {
    super(next);
  }

    @Override
    public void processAdd(AddUpdateCommand cmd) throws IOException {
      SolrInputDocument doc = cmd.getSolrInputDocument();

        Collection<String> incomingFieldNames = doc.getFieldNames();

        Iterator<String> iterator = incomingFieldNames.iterator();
        ArrayList<String> usedFields = new ArrayList<String>();
        while (iterator.hasNext()) {
          String f = iterator.next();
          if (fieldNamePattern.matcher(f).matches()) {
            usedFields.add(f);
          }
        }

        doc.addField(fieldsUsedFieldName, usedFields.toArray());
        super.processAdd(cmd);
    }
}
FieldsUsedUpdateProcessor
          in action
schema.xml
  <dynamicField name="*_attribute" type="string" indexed="true" stored="true" multiValued="true"/>

Add some documents
solr.add([{:id=>1, :name => "Big Blue Shoes", :size_attribute => 'L', :color_attribute => 'Blue'},
          {:id=>2, :name => "Cool Gizmo", :memory_attribute => "16GB", :color_attribute => 'White'}])
solr.commit

Facet on attribute_fields
 - http://localhost:8983/solr/select?q=*:*&facet=on&facet.field=attribute_fields&wt=json&indent=on
      "facet_fields":{
          "attribute_fields":[
             "color_attribute",2,
             "memory_attribute",1,
             "size_attribute",1]}
Search Components
• Built-in: Clustering, Debug, Facet, Highlight,
  MoreLikeThis, Query, QueryElevation,
  SpellCheck, Stats, TermVector, Terms
• Non-distributed API:
 • prepare(ResponseBuilder rb)
 • process(ResponseBuilder rb)
Example - auto facet
          select
•   It sure would be nice if you could have Solr automatically
    select field(s) for faceting based dynamically off the
    profile of the results. For example, you're indexing
    disparate types of products, all with varying attributes
    (color, size - like for apparel, memory_size - for
    electronics, subject - for books, etc), and a user searches
    for "ipod" where most products match products with
    color and memory_size attributes... let's automatically
    facet on those fields.

•   https://issues.apache.org/jira/browse/SOLR-2641
AutoFacetSelection
       Component
•   Too much code for a slide, let's take a look in
    an IDE...

•   Basically -

    •   process() gets autofacet.field and autofacet.n
        request params, facets on field, takes top N
        values, sets those as facet.field's

    •   Gotcha - need to call rb.setNeedDocSet
        (true) in prepare() as faceting needs it
SearchComponent
              config
<searchComponent name="autofacet"
     class="solr.AutoFacetSelectionComponent"/>
<requestHandler name="/searchplus"
                class="solr.SearchHandler">
  <arr name="components">
    <str>query</str>
    <str>autofacet</str>
    <str>facet</str>
    <str>debug</str>
  </arr>
</requestHandler>
autofacet success
http://localhost:8983/solr/searchplus
?q=*:*&facet=on&autofacet.field=attribute_fields&wt=json&indent=on
{
  "response":{"numFound":2,"start":0,"docs":[
       {
         "size_attribute":["L"],
         "color_attribute":["Blue"],
         "name":"Big Blue Shoes",
         "id":"1",
         "attribute_fields":["size_attribute",
           "color_attribute"]},
       {
         "color_attribute":["White"],
         "name":"Cool Gizmo",
         "memory_attribute":["16GB"],
         "id":"2",
         "attribute_fields":["color_attribute",
           "memory_attribute"]}]
  },
  "facet_counts":{
     "facet_queries":{},
     "facet_fields":{
       "color_attribute":[
         "Blue",1,
         "White",1],
       "memory_attribute":[
         "16GB",1]}}}
Distributed-aware
    SearchComponents
•   SearchComponent has a few distributed mode
    methods:

    •   distributedProcess(ResponseBuilder)

    •   modifyRequest(ResponseBuilder rb,
        SearchComponent who, ShardRequest sreq)

    •   handleResponses(ResponseBuilder rb,
        ShardRequest sreq)

    •   finishStage(ResponseBuilder rb)
Testing

• AbstractSolrTestCase
• SolrTestCaseJ4
• SolrMeter
 • http://code.google.com/p/solrmeter/
For more information...
•   http://www.lucidimagination.com

•   LucidFind

    •   search Lucene ecosystem: mailing lists, wikis, JIRA, etc

    •   http://search.lucidimagination.com

•   Getting started with LucidWorks Enterprise:

    •   http://www.lucidimagination.com/products/
        lucidworks-search-platform/enterprise

•   http://lucene.apache.org/solr - wiki, e-mail lists
Thank You!

Mais conteúdo relacionado

Mais procurados

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetuprcmuir
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solrsagar chaturvedi
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHPPaul Borgermans
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsOpenSource Connections
 

Mais procurados (20)

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Hacking Lucene for Custom Search Results
Hacking Lucene for Custom Search ResultsHacking Lucene for Custom Search Results
Hacking Lucene for Custom Search Results
 

Destaque

Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst AgainVarun Thacker
 
Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Provectus
 
Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world usesRogue Wave Software
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered LibrariesErik Hatcher
 
Why I want to Kazan
Why I want to KazanWhy I want to Kazan
Why I want to KazanProvectus
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Apache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesApache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesPeter
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logginglucenerevolution
 
Gimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeGimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeRogue Wave Software
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StorySourcesense
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
How to achieve security, reliability, and productivity in less time
How to achieve security, reliability, and productivity in less timeHow to achieve security, reliability, and productivity in less time
How to achieve security, reliability, and productivity in less timeRogue Wave Software
 
Faceted Search And Result Reordering
Faceted Search And Result ReorderingFaceted Search And Result Reordering
Faceted Search And Result ReorderingVarun Thacker
 
Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...
Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...
Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...Provectus
 

Destaque (20)

Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"Сергей Моренец: "Gradle. Write once, build everywhere"
Сергей Моренец: "Gradle. Write once, build everywhere"
 
Open source applied: Real-world uses
Open source applied: Real-world usesOpen source applied: Real-world uses
Open source applied: Real-world uses
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Why I want to Kazan
Why I want to KazanWhy I want to Kazan
Why I want to Kazan
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Apache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build SitesApache Solr Changes the Way You Build Sites
Apache Solr Changes the Way You Build Sites
 
Multi faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & loggingMulti faceted responsive search, autocomplete, feeds engine & logging
Multi faceted responsive search, autocomplete, feeds engine & logging
 
Gimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source codeGimme shelter: Tips on protecting proprietary and open source code
Gimme shelter: Tips on protecting proprietary and open source code
 
Hackathon
HackathonHackathon
Hackathon
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents Story
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
How to achieve security, reliability, and productivity in less time
How to achieve security, reliability, and productivity in less timeHow to achieve security, reliability, and productivity in less time
How to achieve security, reliability, and productivity in less time
 
Faceted Search And Result Reordering
Faceted Search And Result ReorderingFaceted Search And Result Reordering
Faceted Search And Result Reordering
 
Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...
Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...
Дима Гадомский (Юскутум) “Можно ли позаимствовать дизайн и функционал так, чт...
 

Semelhante a Lucene for Solr Developers

Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCampGokulD
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2GokulD
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6DEEPAK KHETAWAT
 
Iterator - a powerful but underappreciated design pattern
Iterator - a powerful but underappreciated design patternIterator - a powerful but underappreciated design pattern
Iterator - a powerful but underappreciated design patternNitin Bhide
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Illuminating Lucene.Net
Illuminating Lucene.NetIlluminating Lucene.Net
Illuminating Lucene.NetDean Thrasher
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document ClassificationAlessandro Benedetti
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationSease
 

Semelhante a Lucene for Solr Developers (20)

Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Apache solr
Apache solrApache solr
Apache solr
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Iterator - a powerful but underappreciated design pattern
Iterator - a powerful but underappreciated design patternIterator - a powerful but underappreciated design pattern
Iterator - a powerful but underappreciated design pattern
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Illuminating Lucene.Net
Illuminating Lucene.NetIlluminating Lucene.Net
Illuminating Lucene.Net
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document Classification
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
 

Mais de Erik Hatcher

Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and TricksErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Erik Hatcher
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrSolr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrErik Hatcher
 

Mais de Erik Hatcher (9)

Ted Talk
Ted TalkTed Talk
Ted Talk
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrSolr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache Solr
 

Último

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 

Último (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 

Lucene for Solr Developers

  • 1. Lucene for Solr Developers NFJS - Boston, September 2011 Presented by Erik Hatcher erik.hatcher@lucidimagination.com Lucid Imagination http://www.lucidimagination.com
  • 2. About me... • Co-author, "Lucene in Action" (and "Java Development with Ant" / "Ant in Action" once upon a time) • "Apache guy" - Lucene/Solr committer; member of Lucene PMC, member of Apache Software Foundation • Co-founder, evangelist, trainer, coder @ Lucid Imagination
  • 3. About Lucid Imagination... • Lucid Imagination provides commercial-grade support, training, high-level consulting and value- added software for Lucene and Solr. • We make Lucene ‘enterprise-ready’ by offering: • Free, certified, distributions and downloads. • Support, training, and consulting. • LucidWorks Enterprise, a commercial search platform built on top of Solr.
  • 4. What is Lucene? • An open source search library (not an application) • 100% Java • Continuously improved and tuned over more than 10 years • Compact, portable index representation • Programmable text analyzers, spell checking and highlighting • Not a crawler or a text extraction tool
  • 5. Inverted Index • Lucene stores input data in what is known as an inverted index • In an inverted index each indexed term points to a list of documents that contain the term • Similar to the index provided at the end of a book • In this case "inverted" simply means the list of terms point to documents • It is much faster to find a term in an index, than to scan all the documents
  • 7. Segments and Merging • A Lucene index is a collection of one or more sub-indexes called segments • Each segment is a fully independent index • A multi-way merge algorithm is used to periodically merge segments • New segments are created when an IndexWriter flushes new documents and pending deletes to disk • Trying for a balance between large-scale performance vs. small- scale updates • Optimization merges all segments into one
  • 9. Segments • When a document is deleted it still exists in an index segment until that segment is merged • At certain trigger points, these Documents are flushed to the Directory • Can be forced by calling commit • Segments are periodically merged
  • 13. Committed and Warmed
  • 14. Lucene Scoring • Lucene uses a similarity scoring formula to rank results by measuring the similarity between a query and the documents that match the query. The factors that form the scoring formula are: • Term Frequency: tf (t in d). How often the term occurs in the document. • Inverse Document Frequency: idf (t). A measure of how rare the term is in the whole collection. One over the number of times the term appears in the collection. • Terms that are rare throughout the entire collection score higher.
  • 15. Coord and Norms • Coord: The coordination factor, coord (q, d). Boosts documents that match more of the search terms than other documents. • If 4 of 4 terms match coord = 4/4 • If 3 of 4 terms match coord = 3/4 • Length Normalization - Adjust the score based on length of fields in the document. • shorter fields that match get a boost
  • 16. Scoring Factors (cont) • Boost: (t.field in d). A way to boost a field or a whole document above others. • Query Norm: (q). Normalization value for a query, given the sum of the squared weights of each of the query terms. • You will often hear the Lucene scoring simply referred to as TF·IDF.
  • 17. Explanation • Lucene has a feature called Explanation • Solr uses the debugQuery parameter to retrieve scoring explanations 0.2987913 = (MATCH) fieldWeight(text:lucen in 688), product of: 1.4142135 = tf(termFreq(text:lucen)=2) 9.014501 = idf(docFreq=3, maxDocs=12098) 0.0234375 = fieldNorm(field=text, doc=688)
  • 18. Lucene Core • IndexWriter • Directory • IndexReader, IndexSearcher • analysis: Analyzer, TokenStream, Tokenizer,TokenFilter • Query
  • 20. Customizing - Don't do it! • Unless you need to. • In other words... ensure you've given the built-in capabilities a try, asked on the e-mail list, and spelunked into at least Solr's code a bit to make some sense of the situation. • But we're here to roll up our sleeves, because we need to...
  • 21. But first... • Look at Lucene and/or Solr source code as appropriate • Carefully read javadocs and wiki pages - lots of tips there • And, hey, search for what you're trying to do... • Google, of course • But try out LucidFind and other Lucene ecosystem specific search systems - http://www.lucidimagination.com/search/
  • 22. Extension points • Tokenizer, TokenFilter, • QParser CharFilter • DataImportHandler • SearchComponent hooks • RequestHandler • data sources • ResponseWriter • entity processors • FieldType • transformers • Similarity • several others
  • 23. Factories • FooFactory (most) everywhere. Sometimes there's BarPlugin style • for sake of discussion... let's just skip the "factory" part • In Solr, Factories and Plugins are used by configuration loading to parameterize and construct
  • 24. "Installing" plugins • Compile .java to .class, JAR it up • Put JAR files in either: • <solr-home>/lib • a shared lib when using multicore • anywhere, and register location in solrconfig.xml • Hook in plugins as appropriate
  • 25. Multicore sharedLib <solr sharedLib="/usr/local/solr/customlib" persistent="true"> <cores adminPath="/admin/cores"> <core instanceDir="core1" name="core1"/> <core instanceDir="core2" name="core2"/> </cores> </solr>
  • 26. Plugins via solrconfig.xml • <lib dir="/path/to/your/custom/jars" />
  • 28. Primer • Tokens, Terms • Attributes: Type, Payloads, Offsets, Positions, Term Vectors • part of the picture:
  • 29. Version • enum: • Version.LUCENE_31, Version.LUCENE_32, etc • Version.onOrAfter(Version other)
  • 30. CharFilter • extend BaseCharFilter • enables pre-tokenization filtering/morphing of incoming field value • only affects tokenization, not stored value • Built-in CharFilters: HTMLStripCharFilter, PatternReplaceCharFilter, and MappingCharFilter
  • 31. Tokenizer • common to extend CharTokenizer • implement - • protected abstract boolean isTokenChar(int c); • optionally override - • protected int normalize(int c) • extend Tokenizer directly for finer control • Popular built-in Tokenizers include: WhitespaceTokenizer, StandardTokenizer, PatternTokenizer, KeywordTokenizer, ICUTokenizer
  • 32. TokenFilter • a TokenStream whose input is another TokenStream • Popular TokenFilters include: LowerCaseFilter, CommonGramsFilter, SnowballFilter, StopFilter, WordDelimiterFilter
  • 33. Lucene's analysis APIs • tricky business, what with Attributes (Source/Factory's), State, characters, code points,Version, etc... • Test!!! • BaseTokenStreamTestCase • Look at Lucene and Solr's test cases
  • 34. Solr's Analysis Tools • Admin analysis tool • Field analysis request handler • DEMO
  • 35. Query Parsing • String -> org.apache.lucene.search.Query
  • 36. QParserPlugin public abstract class QParserPlugin implements NamedListInitializedPlugin { public abstract QParser createParser( String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req); }
  • 37. QParser public abstract class QParser { public abstract Query parse() throws ParseException; }
  • 38. Built-in QParsers from QParserPlugin.java /** internal use - name to class mappings of builtin parsers */ public static final Object[] standardPlugins = { LuceneQParserPlugin.NAME, LuceneQParserPlugin.class, OldLuceneQParserPlugin.NAME, OldLuceneQParserPlugin.class, FunctionQParserPlugin.NAME, FunctionQParserPlugin.class, PrefixQParserPlugin.NAME, PrefixQParserPlugin.class, BoostQParserPlugin.NAME, BoostQParserPlugin.class, DisMaxQParserPlugin.NAME, DisMaxQParserPlugin.class, ExtendedDismaxQParserPlugin.NAME, ExtendedDismaxQParserPlugin.class, FieldQParserPlugin.NAME, FieldQParserPlugin.class, RawQParserPlugin.NAME, RawQParserPlugin.class, TermQParserPlugin.NAME, TermQParserPlugin.class, NestedQParserPlugin.NAME, NestedQParserPlugin.class, FunctionRangeQParserPlugin.NAME, FunctionRangeQParserPlugin.class, SpatialFilterQParserPlugin.NAME, SpatialFilterQParserPlugin.class, SpatialBoxQParserPlugin.NAME, SpatialBoxQParserPlugin.class, JoinQParserPlugin.NAME, JoinQParserPlugin.class, };
  • 39. Local Parameters • {!qparser_name param=value}expression • or • {!qparser_name param=value v=expression} • Can substitute $references from request parameters
  • 40. Param Substitution solrconfig.xml <requestHandler name="/document" class="solr.SearchHandler"> <lst name="invariants"> <str name="q">{!term f=id v=$id}</str> </lst> </requestHandler> Solr request http://localhost:8983/solr/document?id=FOO37
  • 41. Custom QParser • Implement a QParserPlugin that creates your custom QParser • Register in solrconfig.xml • <queryParser name="myparser" class="com.mycompany.MyQParserPlugin"/>
  • 42. Update Processor • Responsible for handling these commands: • add/update • delete • commit • merge indexes
  • 43. Built-in Update Processors • RunUpdateProcessor • Actually performs the operations, such as adding the documents to the index • LogUpdateProcessor • Logs each operation • SignatureUpdateProcessor • duplicate detection and optionally rejection
  • 44. UIMA Update Processor • UIMA - Unstructured Information Management Architecture - http://uima.apache.org/ • Enables UIMA components to augment documents • Entity extraction, automated categorization, language detection, etc • "contrib" plugin • http://wiki.apache.org/solr/SolrUIMA
  • 45. Update Processor Chain • UpdateProcessor's sequence into a chain • Each processor can abort the entire update or hand processing to next processor in the chain • Chains, of update processor factories, are specified in solrconfig.xml • Update requests can specify an update.processor parameter
  • 46. Default update processor chain From SolrCore.java // construct the default chain UpdateRequestProcessorFactory[] factories = new UpdateRequestProcessorFactory[]{ new RunUpdateProcessorFactory(), new LogUpdateProcessorFactory() }; Note: these steps have been swapped on trunk recently
  • 47. Example Update Processor • What are the best facets to show for a particular query? Wouldn't it be nice to see the distribution of document "attributes" represented across a result set? • Learned this trick from the Smithsonian, who were doing it manually - add an indexed field containing the field names of the interesting other fields on the document. • Facet on that field "of field names" initially, then request facets on the top values returned.
  • 48. Config for custom update processor <updateRequestProcessorChain name="fields_used" default="true"> <processor class="solr.processor.FieldsUsedUpdateProcessorFactory"> <str name="fieldsUsedFieldName">attribute_fields</str> <str name="fieldNameRegex">.*_attribute</str> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain>
  • 49. FieldsUsedUpdateProcessorFactory public class FieldsUsedUpdateProcessorFactory extends UpdateRequestProcessorFactory { private String fieldsUsedFieldName; private Pattern fieldNamePattern; public UpdateRequestProcessor getInstance(SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) { return new FieldsUsedUpdateProcessor(req, rsp, this, next); } // ... next slide ... }
  • 50. FieldsUsedUpdateProcessorFactory @Override public void init(NamedList args) { if (args == null) return; SolrParams params = SolrParams.toSolrParams(args); fieldsUsedFieldName = params.get("fieldsUsedFieldName"); if (fieldsUsedFieldName == null) { throw new SolrException (SolrException.ErrorCode.SERVER_ERROR, "fieldsUsedFieldName must be specified"); } // TODO check that fieldsUsedFieldName is a valid field name and multiValued String fieldNameRegex = params.get("fieldNameRegex"); if (fieldNameRegex == null) { throw new SolrException (SolrException.ErrorCode.SERVER_ERROR, "fieldNameRegex must be specified"); } fieldNamePattern = Pattern.compile(fieldNameRegex); super.init(args); }
  • 51. class FieldsUsedUpdateProcessor extends UpdateRequestProcessor { public FieldsUsedUpdateProcessor(SolrQueryRequest req, SolrQueryResponse rsp, FieldsUsedUpdateProcessorFactory factory, UpdateRequestProcessor next) { super(next); } @Override public void processAdd(AddUpdateCommand cmd) throws IOException { SolrInputDocument doc = cmd.getSolrInputDocument(); Collection<String> incomingFieldNames = doc.getFieldNames(); Iterator<String> iterator = incomingFieldNames.iterator(); ArrayList<String> usedFields = new ArrayList<String>(); while (iterator.hasNext()) { String f = iterator.next(); if (fieldNamePattern.matcher(f).matches()) { usedFields.add(f); } } doc.addField(fieldsUsedFieldName, usedFields.toArray()); super.processAdd(cmd); } }
  • 52. FieldsUsedUpdateProcessor in action schema.xml <dynamicField name="*_attribute" type="string" indexed="true" stored="true" multiValued="true"/> Add some documents solr.add([{:id=>1, :name => "Big Blue Shoes", :size_attribute => 'L', :color_attribute => 'Blue'}, {:id=>2, :name => "Cool Gizmo", :memory_attribute => "16GB", :color_attribute => 'White'}]) solr.commit Facet on attribute_fields - http://localhost:8983/solr/select?q=*:*&facet=on&facet.field=attribute_fields&wt=json&indent=on "facet_fields":{ "attribute_fields":[ "color_attribute",2, "memory_attribute",1, "size_attribute",1]}
  • 53. Search Components • Built-in: Clustering, Debug, Facet, Highlight, MoreLikeThis, Query, QueryElevation, SpellCheck, Stats, TermVector, Terms • Non-distributed API: • prepare(ResponseBuilder rb) • process(ResponseBuilder rb)
  • 54. Example - auto facet select • It sure would be nice if you could have Solr automatically select field(s) for faceting based dynamically off the profile of the results. For example, you're indexing disparate types of products, all with varying attributes (color, size - like for apparel, memory_size - for electronics, subject - for books, etc), and a user searches for "ipod" where most products match products with color and memory_size attributes... let's automatically facet on those fields. • https://issues.apache.org/jira/browse/SOLR-2641
  • 55. AutoFacetSelection Component • Too much code for a slide, let's take a look in an IDE... • Basically - • process() gets autofacet.field and autofacet.n request params, facets on field, takes top N values, sets those as facet.field's • Gotcha - need to call rb.setNeedDocSet (true) in prepare() as faceting needs it
  • 56. SearchComponent config <searchComponent name="autofacet" class="solr.AutoFacetSelectionComponent"/> <requestHandler name="/searchplus" class="solr.SearchHandler"> <arr name="components"> <str>query</str> <str>autofacet</str> <str>facet</str> <str>debug</str> </arr> </requestHandler>
  • 57. autofacet success http://localhost:8983/solr/searchplus ?q=*:*&facet=on&autofacet.field=attribute_fields&wt=json&indent=on { "response":{"numFound":2,"start":0,"docs":[ { "size_attribute":["L"], "color_attribute":["Blue"], "name":"Big Blue Shoes", "id":"1", "attribute_fields":["size_attribute", "color_attribute"]}, { "color_attribute":["White"], "name":"Cool Gizmo", "memory_attribute":["16GB"], "id":"2", "attribute_fields":["color_attribute", "memory_attribute"]}] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{ "color_attribute":[ "Blue",1, "White",1], "memory_attribute":[ "16GB",1]}}}
  • 58. Distributed-aware SearchComponents • SearchComponent has a few distributed mode methods: • distributedProcess(ResponseBuilder) • modifyRequest(ResponseBuilder rb, SearchComponent who, ShardRequest sreq) • handleResponses(ResponseBuilder rb, ShardRequest sreq) • finishStage(ResponseBuilder rb)
  • 59. Testing • AbstractSolrTestCase • SolrTestCaseJ4 • SolrMeter • http://code.google.com/p/solrmeter/
  • 60. For more information... • http://www.lucidimagination.com • LucidFind • search Lucene ecosystem: mailing lists, wikis, JIRA, etc • http://search.lucidimagination.com • Getting started with LucidWorks Enterprise: • http://www.lucidimagination.com/products/ lucidworks-search-platform/enterprise • http://lucene.apache.org/solr - wiki, e-mail lists