Solr/Elasticsearch for CF Developers (and others)

Solr/ElasticSearch
for CF Developers
By Mary Jo Sminkey

Who Am I?
• Senior Web Developer at CFWebtools, LLC
• ColdFusion Developer since CF3/Allaire
• Cosplayer
• Dog Trainer
• Sewer/Baker/Knitter/Origamist/Ass. Crafts
• Cancer Survivor
• Fibromyalgia/Invisibile Disabilities Advocate

What is Solr?
• Standalone Full-Text Search Engine with Apache Lucene Backend
• Open-source, distributed, highly scalable, enterprise grade search
• http://lucene.apache.org/solr/
• Included in ColdFusion since CF9 replacing Verity (cfsearch/cfindex)
• Thru CF11 – Solr 3
• CF2016 – Solr 5 (5.1.2)
• Current Release of Solr is version 6.2.0

Why use Solr/ES instead of CF tags?
• Any CF version prior to 2016 has ancient Solr 3.x versions
• Full Access to latest Solr/ES versions and patches
• Ability to use cloud-based distributed setups (essential for enterprise
sites)
• Access to far more features and use of REST/JSON
• Code more easily converted to other search engines and languages

Solr vs. ElasticSearch – What should I use?
• Solr has been around a lot longer so more mature, very well documented
and has strong backwards compatibility. Developers often mention ES not
nearly as well documented so plan on investing in other sources to really get
a handle on it.
• ES being younger is built on modern standards and ideas (particularly
REST), designed specifically for handling large indices and high query rates,
and since it isn’t as strictly commuinty driven often can move forward quicker
with new features, bug fixes, etc. (although co
• Both have very active communities and are very actively still being
developed and moved forward. Solr in particular has pretty much caught
up to many of the advances brought by ElasticSearch entering the
marketplace such as full REST support.

(cont)
• Solr excels at text-search applications, ElasticSearch for analytics (lots
of monitoring and metrics exposed).
• In areas like log analysis, ES is by far the more common choice to
use. This is due to its very advanced “aggregations” framework, which
replaced earlier faceting.
• https://www.elastic.co/blog/out-of-this-world-aggregations
• Solr uses a terse syntax, vs. ES which is much more verbose
• This makes ES generally more readible, but the terse syntax of Solr
make more advanced relevancy possibilites easier to handle, of
particular interest in text-search applications.

(cont)
• SearchComponents in Solr allow for much more easily customizable
searches that can be easily reused across multiple applications or
within an application.
• ES generally considered a bit easier to get started with and do
clustering etc. Solr generally requires a bit more work to get your head
around, forces you to read over and learn the config files to get
running for instance - but this is not necessarily a BAD thing.
• If you are going to use REST, both now have excellent support
although ES more REST-compliant. But if you plan to go another
route, Solr tends to have better support, for instance it has excellent
Java support via the Solrj library.

Amazon CloudSearch
• There are some other Lucene-based searches you can consider.
• Most popular of these is Amazon CloudSeach
• Easy to set up, AWS managed service with automatic scaling
• Provides most commonly used text-search features like highlighting,
autocomplete, simple faceting, grouping, geospatial search, etc.
• It is considerably more limited that Solr and ES when it comes to doing
advanced search relevancy tuning and/or advanced metrics.

Solr vs. ElasticSearch – More Reading
• http://solr-vs-elasticsearch.com/
• https://sematext.com/blog/2015/01/30/solr-elasticsearch-comparison/
• https://www.datanami.com/2015/01/22/solr-elasticsearch-question/
• http://opensourceconnections.com/blog/2015/12/15/solr-vs-elasticsearch-
relevance-part-one/
• http://opensourceconnections.com/blog/2016/01/22/solr-vs-elasticsearch-
relevance-part-two/
• http://harish11g.blogspot.com/2015/07/amazon-cloudsearch-vs-
elasticsearch-vs-Apache-Solr-comparison-report.html

So let’s look at the features of Solr
(particularly Solr 4+ versions)
• Full REST API for schema management, indexing, searching, etc.
(Solr 5+)
• Wide variety of built in tokenizers and analyzers
• Grouping, faceting, highlighting, spelling suggestions, autocomplete
• Filtering, document and field boosting, custom ranking, etc.
• Near Real-Time Indexing
• Extensible Plugin Architecture

Solr 6 Features
• Parallel SQL – The big WOW feature of version 6 is bringing SQL support to
Solr which works across SolrCloud collections. This is done by a SQL parser
that converts SQL queries to Solr streaming expressions.
• SQL Request Handler – SolrCloud collections can be queried with standard
SQL language using the /sql request handler.
• JDBC Driver – Connect to the SolrCloud collections with any tool that
supports JDBC and query the collection directly
• Still somewhat experimental and not quite ready for primetime usage but
improving rapidly.

Solr 6 Features (cont)
• Many other improvements and advancements with streaming
expressions (Merging search with parallel computing, across multiple
sources)
• Push/Pull streaming, request/response streaming
• Solr collections able to auto-update itself via these kinds of streaming
commands.
• https://sematext.com/blog/tag/streaming-expressions/

Let’s Focus on Text Searches!
• This is primarily what CF developers would have been using Solr
integration for – cfsearch/cfindex
• While a number of things we’re going to look at are included in the CF
integration, you can do a lot more once you move to standalone Solr.

Our Target Site and Objectives
• Classic industries (classicindustries.com)
• Ecommerce Site for Classic Car Parts
• Customers can select a car model (catalog) and year to filter their search
• Single text box search that needs to search across multiple fields but return
the best possible matches
• Search pages need to also include data like breadcrumb trail, category
menus, nested structure of category totals, etc.
• We would like to add additional elements like spelling suggestions and
highlighting.

Step 1 - Schema
• The schema defines the fields and their types that will be indexed for
searching.
• Solr/ES can both be used schema or schema-less, support dynamic
field types, etc. Typically you would only use schema-less for
development and then switch to the managed schema for production.
• Solr 5 and up can handle most schema changes via the REST
service.
• You can also make schema changes via the Solr admin console

Sample REST – Add Field Type
POST /schema
Content-type: application/json
{
"add-field-type": {
"name": "simpleTextSpell",
"class": "solr.TextField",
"positionIncrementGap":100,
"indexAnalyzer": {
"tokenizer": {
"class": "solr.StandardTokenizerFactory"
},
"filters": [
{
"class": "solr.LowerCaseFilterFactory"
},
{
"class": "solr.RemoveDuplicatesTokenFilterFactory"
}
]
}
}
}

Sample REST – Add Field
POST /schema
{
"add-field": {
"name": "simpleSpell",
”type": "simpleTextSpell",
”indexed": true,
”stored”: true
}
}

Schema - Analyzers, Tokenizers and Filters
• These are used to tell Solr how to prepare the text string for indexing
(and/or quering).
• Proper handling of this step is essential for good search results.
• While Solr has a lot of built-in field types for text fields, you may
oftenneed to add your own field types to get the best results.
• With simple text fields, you often will use the same analyzer for the
indexing and the query steps. The more complex handling your field
needs, the more likely you may need different analyzers for the
indexing vs. the querying.

Schema - Sample Analyzers
<fieldType name="nametext" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeepWordFilterFactory" words="keepwords.txt"/>
<filter class="solr.SynonymFilterFactory" synonyms="syns.txt"/>
</analyzer>
<analyzer type="query”>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

Schema - Tokenizers
• Tokenizers determine how the text string will be split up into “tokens”.
A common first step is to split the string on whitespace and/or
punctuation (sentences, etc. split into the individual words so we can
search on them instead of the entire */string).
• Solr includes a whole variety of tokenizers includes ones designed for
specific kinds of data, like file paths or email addresses, as well as to
handle multi-lingual text.
• You can also process your token using regular expressions…. Or
return the entire field as a single token.

Schema - Filters
• Filters are used after tokenizing to further manipulate your data.
• Some common filters actions are to convert everything to lowercase
so searches aren’t case-sensitive and discarding common words that
aren’t useful in searches (a, and, the, etc.)
• A dictionary filter might be used on a field that you intend to use for
spelling corrections.
• You can also use a synonym filter to create word mappings to match
on.

Schema - CharFilters
• Unlike regular filters, charFilters are used PRIOR to tokenizing your
data.
• You might use these to do things like strip out HTML tags, comments,
or any other text you don’t want your search to find.
• Solr includes both a charFilter specifically for removing HTML markup
as well as a Regex style replace filter.

Schema - Copy Fields
• Your schema can include fields which just copy data from other fields
to index. Typically they will have a different set of analyzers to
manipulate the data.
• For example, I may want my search to return matches on the original
words higher than the ones that match on a synonym. To do this, I
would use a field copy for the synonym matches which will give lower
ranking.
• Another common use is for spell checking in which you may want to
copy text fields that use different field types to the one you use for
spellchecking.

Schema in Place, Let’s Index Our Data!
• In the past most CF code would use the SolrJ library for standalone
Solr work.
• This is still an option but now we have REST as an alternative. The
REST libraries are generally much easier to work with since you don’t
have to figure out all the nested methods that are holding your data,
it’s just all returned in a simple JSON object.
• There is very little now in Solr that you cannot do through REST,
including adding or modifying cores, making all schema changes, and
of course indexing and searching.

Schema in Place, Let’s Index Our Data! (cont)
• Solr’s REST integration has been continually improved but you may still run
into some gotchas. For instance, the handling of multiple documents in an
indexing request when you need to include additional parameters like a
custom boost makes it impossible in most languages to simply convert a
native object to the necessary JSON object (multiple name-value pairs in an
object with the same name).
• ColdFusion has its own quirks (bugs) that you have to watch out for. The
most common one you’ll run into is it trying to treat a string that is all
numbers as a numeric value and not wrapping it in quotes. A typical hack is
to add some string in front of any such field prior to CF serializing and then
doing a search and replace to remove it prior to the REST request.

Sample REST – Indexing Data
POST //update?wt=json
{
"add": {
{ "doc" :
{
”productname": ”Sample AC Part",
“sku”: “AC234”,
”catalogs" : [”Camaro”, “Impala”,”Firebird”],
"id" : 8753 }
} ,
“boost”: 2.0
}
"add": {
{ "doc" :
{
”productname": ”Sample Body Part",
“sku”: “BD495”,
”catalogs" : [”Camaro”],
"id" : 968944}
}
}
}

More on Indexing
• Adding and updating data uses the same request format, if you
include a key that is already in the index, Solr will update it.
• However keep in mind that if you are making schema or major data
changes, that updating won’t REMOVE old keys. To do so you need to
either locate those and send delete requests, or you need to purge
your data and then do a clean re-indexing (advantage of SolrCloud
over using same server to index and search).
• You can delete either by key, or by query. For example, purge all data
with deleteByQuery('*.*')

More on Indexing (cont.)
• You can also index bulk data using a data import handler, which can
be done in a number of formats.
• By default Solr doesn’t have any security on the admin or managed
schema, so you will want to lock it down for production servers.
• The solr config allows you to specific auto-commit times, replication to
slave servers, etc.
• Soft auto-commits can be used so updates can be made live almost
immediately without the overhead of doing a hard commit (near real-
time search).

We Have Data, Now Let’s Search!
• Solr comes with some built-in request handlers you can use or
customize, or you can add your own.
• The request handler configuration determines what settings, defaults
and components (like spellcheck) are available for requests to that
handler.
• The simpliest search is just to send the query parameter “q” to the
search request: /select?q=front+bumper
• Search results can be returned in a variety of formats, including json,
xml, csv and language-specific formats like php and ruby.

Query Parsers
• The parser used determines what parameters you can use.
• For text searching you generally will use the Dismax or Extended
Dismax parsers, which allow for improving the relevance of your
search results.
• Dismax includes term boosting, phrase boosting and minimum-should-
match parameters among others.
• Extended Dismax extends this with even more boosting options
including field boosting, more phrase boosting options, proximity
boosting, and ignoring stopwords at query time.

Filters
• All types of Solr query parsers support filters.
• This is the most basic way of restricting what documents to search.
• In our sample site application, we add filters based on things like the
catalog and year the user selects, if they are looking for new or outlet
products, if they have drilled down into a category, etc.
• You can have any number of filters and they can include complex
boolean expressions.

Filter Examples
• fq=catalogid:1
• fq=year:1967
• fq=newproduct:true
• fq=catalogid:(1 TO 15)
• fq=(discontinueflg:N OR availablecount:[1 TO *])

Search Relevance
• This is a topic we could spend an entire day on.
• Many enterprise sites have regular search audits and do extensive
analysis to look at their relevancy scores and how to improve them.
• We’ll take a quick look at our example site and some things that Solr
allows us to do in order to improve our search relevancy.
• We are using Extended Dismax for the maximum possible options for
controlling search relevancy.

Search Relevance (cont)
• By default Solr is scoring documents by how many times the search
terms are found.
• What we want to do is “boost” fields and documents, etc. that we want
Solr to place more emphasis on.
• We want to also look at how to handle searches that include multiple
terms to search on (phrases).

Search Relevance – Example App Fields
• Product Number (SKU) – we want to put matches on the SKU right at
the top in searches.
• Product Name – next most important is matches on the product name.
All the most relevant search terms are included in the product name.
• Keywords – custom keywords have additional search terms and
abbreviations we want to match for the product so are fairly relevant.
• Product Info – this is full description of the product that can be used for
searches but due to the extensive amount of text and non-related
words it can have, it’s of fairly low importance

Search Relevance – Synonyms
• Solr has support for synonyms which allow you to map words to
similar ones that you want it to also consider a match.
• Synonyms can be one-way or bi-directional. For instance if there is a
common misspelling people use in a search, you would map that in
one directon only, to the correct spelling.
• Solr does not properly handle multi-term synonyms (see the ‘sea
biscuit’ problem). This is a long-standing bug and there are some
plugins to try and correct for it but they often result in issues with more
complex relevancy setups.

Search Relevance – Sample App Synonyms
• Since we want matches on the original search term to always appear
higher than matches for synonyms, we need to copy the fields used so
we can boost them separately. These fields only need to be indexed,
not stored.
• prodnamesynonym – Product Name Synonym Field. This will get a
boost high enough to help matches appear above most of the other
fields, but not as high as the original product name field.
• proddatasynonym – Additional Product Data Synonym Field. We’ll
copy all the other text fields to this one and give it the lowest boost
score.

Search Relevance – Boosting
• The default value that Solr gives for boosts is 1.0
• Solr does not support negative boosts but anything below 1.0 is
basically a negative boost based on the default.
• Keep in mind as well that Solr is going to score documents on how
often the search terms appear as well. You can use a filter in your
schema to remove duplicate tokens if you don’t want it to do this.
• The boosts for your query are set on the “qf” parameter which tells
Solr which fields you want to query.

Sample App Boosting
prodnumbertext^20.0
prodname^10.0
prodnamesynonym^5.0
keywords^2.0
productinfo^1.0
proddatasynonym^0.25
http://localhost:8983/solr/classic/select?q=front+bumper&defType=edismax&qf=proddatasynonym^0.25+productinfo^1.0+keywords^2.0+prodna
mesynonym^5.0+prodname^10.0+prodnumbertext^20.0

Search Relevance – Phrase Boosting
• Phrase boosting is used for multi-term searches.
• By default, Solr will score documents the same no matter where the
search terms appear in the documents.
• Phrase boosting allows you to score higher the documents where the
search terms are appearing next to, or close to, each other.
• The original phrase boost from the Dismax query parser boosts only
for all search terms being close together. Edismax adds options for 2
and 3-word phrases in your search terms.

Search Relevance – Phrase Boosting (cont)
• The phrase slop setting is used to set how far away terms can be in
order to be consider a match for phrase boost.
• If you set 2 and 3 word phrase boosting, you can use different slop
settings for them.
• Phrase boosting doesn’t have any effect on what documents are
returned by the search, ONLY how they get scored.

Sample App Phrase Boosting
prodname^50.0
prodnamesynonym^25.0
keywords^10.0
productinfo^5.0
proddatasynonym^0.25
http://localhost:8983/solr/classic/select?q=front+bumper&defType=edismax&pf=proddatasynonym^0.25+productinfo^5+keywords^10+prodname
synonym^25+prodname^50&pf2=proddatasynonym^0.25+productinfo^5+keywords^10+prodnamesynonym^25+prodname^50&pf3=proddatasy
nonym^0.25+productinfo^5+keywords^10+prodnamesynonym^25+prodname^50&ps=1

Search Relevance – More Boosting
• There are other boosting options, such as boosting a specific term in
the search, telling Solr to boost the documents that match that
particular term over other terms in the search.
• You can also create complex functions for boosting documents.
• Another common boost is to apply one during indexing to specific
documents. For instance in our classic car site, we apply a boost to
the products that are our best sellers, so that all things being equal,
those products will appear higher in searches.

Minimum-Should-Match
• Solr supports both AND and OR searches which you can set with the q.op
parameter.
• When doing an OR search with multiple search terms, you can also set the minimum
number of terms that have to be matched to return a document.
• For instance if mm=75% and the user enters 4 search terms, then only 3 have to
match to return the document.
• This helps ensure that as users enter more search terms that have a higher chance
of not matching, you can make sure the results still require matching on more than
just a single term (typical OR search) without causing a high percentage of failed
searches.
• There are various options for how to set the minimum match, and you can customize
it based on the number of search terms that were entered.

Example Minimum-Should-Match
• Match 75% of the terms
mm=75%
• 25% of the terms can be missing (same result as previous)
mm=-25%
• For more than 1 term, allow 1 to be missing. For more than 4 terms,
allow 2 mising. For more than 6 terms, allow 33% missing.
mm=1<-1 4<-2 6<-33%

Result Grouping
• Allows you to group results with a similar value in a specific field together.
• This is something you may commonly have to do for something like an
ecommerce site where a single product appears in multiple categories but
you only want to show (or count) one copy of each product.
• Solr gives you a very wide range of options over how to handle the groups.
• You can only group on a single field which should be indexed and a text field.
• The best way to show off what grouping can do is look at our example site.

Result Grouping - Example
• Our classic car parts site has products that appear in multiple car
model catalogs as well as in multiple categories (3-tiers of categories).
• We’ve indexed them based on this combination of unique product id,
catalog id and 3 category ids.
• So when we search, we may get multiples of a specific product and
need to group on the product id to get accurate product counts.
• We also want to simplify the groups as much as possible so that we
don’t have to do a lot of extra work to get to the data.

Group Parameters for Search
<cfscript>
var params = {};
params["group"] = true;
params["group.field"] = ‘product_id’;
// tells Solr to return the total number of groups found
//this will be our total number of products found
params["group.ngroups"] = true;
//this gives us the grouped documents in a flat list
params["group.format"] = 'simple';
//since the groups are just copies of the same product,
//we only need one document in each group
params["group.limit"] = 1;
</cfscript>

Result Grouping – Cont.
• This is just one way to use groups. You can create groups based on
functions and do much, much more with them.
• You can sort both the list of groups (based on the most relevant
document in each) as well as inside the groups themselves.
• Likewise paging can be done both inside and external to the grouping.

Collapse and Expand Results
• These are an alternative to result grouping that are useful particularly
for displaying collapsed search results.
• You provide the field to “collapse” on and Solr will provide results with
a single document per group for that field.
• The expand is then used to tell Solr to return the same query but this
time with an “expanded” section that includes all the documents in the
groups.

Faceting
• Faceting is widely used in commerce sites to show the customer result
counts for numerous search criteria.
• This is where a search engine like Solr really shines over using
database-only solutions that require additional queries and often
extensive processing to come up with the same data.
• You can request any number of facets as part of your query, and get
back result counts for each of them.
• Solr provides a number of different ways to do faceting, we’ll look at
some of the most common.

Faceting – Field/Value
• This facets on a single field based on the value.
• For text fields, you typically don’t want to do any stemming, etc. but just
tokenize the value in the field as-is. In cases where you want to search on
the field and do more analysis as well, a copy field can be used for the
faceting.
• You can restrict matches based on criteria like a specific prefix or ones that
contain a specific string.
• You can set a specific number of matches needed to include the facet, and
even whether to include a count of documents that are missing a value for
the field.

Faceting – Field/Value Example Site
• Back to our classic car site. When viewing our product results, we also want
to get a list of the current categories the product appears in and use that for
a side menu.
• For this example we’ll just look at how we do the top-level category menu.
• As with groups, you generally want to facet on a field that has very little
tokenizing, etc. on it so that Solr returns the original, unmodified values in
the field.
• You can sort the facets by the count (highest first) or alphabetically (index
sort).

Faceting – Field/Value Example
?q=front+hood&json.facet={
categories: {type:terms, field:categoryname}
}
Sample Return:
categories= [ { val = ‘Body Components’, count=50 },
{ val = ‘Hood Components’, count=10 } ]

Faceting – Range
• Another common facet method is range facets. You can use ranges on
any date or numeric field.
• A common use case is to return counts in various price ranges
• In addition to start and end parameters, you can further configure how
Solr will facet the ranges by setting a gap to divide by and how to
handle edge cases.
• You can also configure it to include counts for values that fall outside
the range.

Faceting – Range Example
json.facet={ price_ranges = {'type' ='range',
'field' = 'price',
'start'= 0,
'end'= 1000,
'gap'= 250 }; }
Result:
price_ranges = [ { val = 0, count=50 },
{ val = 250, count=125 },
{ val = 500, count=72 },
{ val = 100, count=52 } ]

Faceting – Use With Filters
• One issue you may run into with facets is when you use them to provide
filters for the customer.
• When you apply the filter, the other options drop out of your search, as well
as your facets.
• For example, if I drill down into a category, the result set only has that
category in it (and so the facet for categories has only that one category as
well). But what if I want to still show a menu of ALL available categories for
that search so the user can change categories?
• Your first thought might be that we’ll have to do another search without the
filter… but WAIT! We actually can do this in the same search request.

Faceting – Domain
• Solr 5.2+ allows you to include the domain for the facet. This allows you to
expand your faceting outside the “domain” of the main search.
• To do this, we’ll first add a name to the filter that selects the category.
fq={!tag=cat}categoryid:100
• Now when we set the facet for the category, we can tell it to ignore the filter
for the category:
json.facet={
categories: { type:terms,
field:categoryname,
domain: { 'excludeTags' = 'cat ' } }
}

Faceting – Pivot Facets
• Also known as decision trees, these are multi-level facets.
• Return counts for field ‘foo’ for each different field ‘bar’ and so forth.
• Pivot facets can be requested fairly simply in a query, just by passing
the list of fields to pivot on:
facet.pivot=category,subcategory,subsubcategory

Faceting – Subfacets
• Subfacets are a newer version of pivot facets (Solr 5+).
• With subfacets, you can nest any kind of facet under any other kind of
facet, with completely separate settings and criteria of its own.
• Designed specifically with JSON in mind.
• Response format easier to handle.
• Lots of other improvements to allow for more advanced sorting,
calcuations, etc. on all levels of the facets.

Faceting – Example Subfacets
json.facet={
categories: { type:terms,
field:categoryname,
domain: { 'excludeTags' = 'cat,subcat ' },
facet: { subcategories: { type:terms,
field:subcategoryname,
domain: { 'excludeTags' = 'cat,subcat ' }
}
}
}
}

Faceting – More Info
• http://yonik.com/json-facet-api/
• http://yonik.com/multi-select-faceting/
• http://yonik.com/solr-subfacets/
• https://lucidworks.com/blog/2014/10/03/pivot-facets-
inside-and-out/

Spellchecking
• You can specify in your search that you want to receive spelling
suggestions in the results.
• Generally the field you want to use for spellchecking will have minimal
tokenizing and stemming on it. You often will want to use a copy field
that can be used specifically for the spellchecking.
• There’s a lot of parameters to set for spellchecking, which I won’t go
over here, but you’ll probably need to play around with them to see
what will work best for your application. There is some performance hit
for returning spell suggestions so you may want to turn them off when
not needed.

Spellchecking (cont)
• You need to make sure the spellcheck component is enabled for the
request handler you are using. If you don’t intend to change the
spellcheck parameters at all during searches, you may want to just set
them as defaults in the request handler.
• Be aware that the spellchecking dictionary is not built automatically.
You can send the paramater spellcheck.build=true on the url to rebuild
it, or in the solr config you can set it to be rebuilt automatically on
commit and/or optimize. Building on commit is generally not a good
idea in production systems.

Highlighting
• You can also have Solr highlight the terms it matched in the search results.
• Again, there’s a lot of ways to customize the highlighter component, both
what it highlights as well as what it wraps the matches in.
• Typically you will want to use termVectors, termPositions, and termOffsets in
your schema definition for the field(s) you will highlight terms in, which allows
you to use the FastVectorHighlighter component or with the standard
highlighter will improve performance.
• With the FastVectorHighlighter you can customize it to highlight matches
with different colors or html classes. It also supports Unicode.

Highlighting (cont)
• The highlighting does not use the same tokenizer/stemming components as
the search.
• Part of what you are configuring for the highlighting is how many highlighted
“snippets” to return and how Solr is to find them and pull them out of your
text fields.
• Solr returns the snippets separately so you’ll generally have to do a search-
and-replace on the original field to put the highlighted terms into it.
• With the FastVectorHighlighter you will want to be sure to include a
Boundary Scanner which ensures that it doesn’t truncate words.

Suggestor
• Used to provide automatic suggestions for query terms (auto-suggest search
box).
• While technically you could use the spellchecker for this, the suggestor is
particularly developed for this use.
• As with the spellchecker, you can configure when the suggestor’s dictionary
is built and you typically will want to copy fields to a field type specifically set
up for this purpose that has minimal analysis on it.
• There are multiple kinds of dictionaries available to use for the suggestor,
and you can get suggestions from more than one in a single request.

MoreLikeThis
• Enables users to search for other documents similar to one in their
current results list.
• You can customize how this component works in many ways, from
which fields to use, number of documents to return, term frequency
requirements, minimum and maximum word lengths to use, boosting
and more.

But Wait, There’s More!
• Pagination and Cursors
• Query Re-Ranking
• Transforming Results
• Result Clustering (tag cloud)
• Spatial/Geospatial Searches
• Term and Term Vectors
Components
• Stats Component
• Caching
• Query Elevation
• RealTime Get
• Exporting Result Sets
• Distributed Search and Index
Sharding
• Content Streams
This covers only a portion of the features of Solr.
Some things we didn’t look at include:

Need More Help?
CFWebtools, LLC
11204 Davenport, Ste. 100
Omaha, NE 68154
402 408 3733 ext. 2
https://www.cfwebtools.com

Solr/Elasticsearch for CF Developers (and others)

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Solr/Elasticsearch for CF Developers (and others)

Semelhante a Solr/Elasticsearch for CF Developers (and others) (20)

Último

Último (20)

Solr/Elasticsearch for CF Developers (and others)