SlideShare uma empresa Scribd logo
1 de 66
Battle of the Giants
Apache Solr 4.0 vs ElasticSearch 0.20

       Rafał Kuć – Sematext International
      @kucrafal @sematext sematext.com
Who Am I
•   „Solr 3.1 Cookbook” author (4.0 inc)
•   Sematext consultant & engineer
•   Solr.pl co-founder
•   Father and husband 




                 Copyright 2012 Sematext Int’l. All rights reserved
What Will I Talk About ?




     Copyright 2012 Sematext Int’l. All rights reserved
Under the Hood
• ElasticSearch 0.20
  – Apache Lucene 3.6.1
• Apache Solr 4.0
  – Apache Lucene 4.0




               Copyright 2012 Sematext Int’l. All rights reserved
Architecture
• What we expect
  – Scalability
  – Fault toleranance
  – High availablity
  – Features
• What we are also looking for
  – Manageability
  – Installation ease
  – Tools

                Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Cluster Architecture
•   Distributed
•   Fault tolerant
•   Only ElasticSearch nodes
•   Single leader
•   Automatic leader election




                Copyright 2012 Sematext Int’l. All rights reserved
SolrCloud Cluster Architecture
•   Distributed
•   Fault tolerant
•   Apache Solr + ZooKeeper ensemble
•   Leader per shard
•   Automatic leader election




               Copyright 2012 Sematext Int’l. All rights reserved
Collection vs Index
• Collection – Solr main logical index
• Index – ElasticSearch main logic structure
• Collections and Indices can be spread among
  different nodes in the cluster




              Copyright 2012 Sematext Int’l. All rights reserved
Multiple Document Types in Index
• ElasticSearch - multiple document types in a
  single index

• Apache Solr - multiple document types in a
  single collection – shared schema.xml




               Copyright 2012 Sematext Int’l. All rights reserved
Shards and Replicas
•   Index / Collection can have many shards
•   Each shard can have 0 or more replicas
•   Replicas are automatically updated
•   Replicas can be promoted to leaders when a
    leader shard goes off-line




                Copyright 2012 Sematext Int’l. All rights reserved
Index and Query Routing
• Control where documents are going
• Control where queries are going
• Manual data distribution




              Copyright 2012 Sematext Int’l. All rights reserved
Querying Without Routing

      Shard 1               Shard 2                 Shard 3               Shard 4



      Shard 5               Shard 6                 Shard 7               Shard 8


Collection / Index




                                       Application


                     Copyright 2012 Sematext Int’l. All rights reserved
Query With Routing

      Shard 1               Shard 2                 Shard 3               Shard 4



      Shard 5               Shard 6                 Shard 7               Shard 8


Collection / Index




                                       Application


                     Copyright 2012 Sematext Int’l. All rights reserved
Routing Docs and Queries in Solr
• Requires some effort
• Defaults to hash based on document
  identifiers
• Can be turned off using
 solr.NoOpDistributingUpdateProcessorFactory

  <updateRequestProcessorChain>
   <processor class="solr.LogUpdateProcessorFactory" />
   <processor class="solr.RunUpdateProcessorFactory" />
   <processor class="solr.NoOpDistributingUpdateProcessorFactory" />
  </updateRequestProcessorChain>


                      Copyright 2012 Sematext Int’l. All rights reserved
Routing Docs and Queries - ElasticSearch
• routing parameter controls target shard
  which document/query will be forwarded to
• defaults to document identifiers
• can be changed to any value
    curl -XPUT localhost:9200/sematext/test/1?routing=1234 -d '{
      "title" : "Test routing document"
    }'

    curl –XGET localhost:9200/sematext/test/_search/?q=*&routing=1234




                      Copyright 2012 Sematext Int’l. All rights reserved
Apache Solr Index Structure
•   Field types defined in schema.xml file
•   Fields defined in schema.xml file
•   Allows automatic value copying
•   Allows dynamic fields
•   Allows custom similarity definition




                 Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Index Structure
•   Schema - less
•   Analyzers and filters defined with HTTP API
•   Fields defined with an HTTP request
•   Multi – field support
•   Allows nested documents
•   Allows parent – child relationship
•   Allows structured data

                 Copyright 2012 Sematext Int’l. All rights reserved
Index Structure Manipulation
• Possible to some extent in Solr as well as
  ElasticSearch
• ElasticSearch allows dynamic mappings
  update (not always)




               Copyright 2012 Sematext Int’l. All rights reserved
Aliasing
• Solr
  – Allows core aliasing
• ElasticSearch
  – Allows index aliasing
  – We can add filter to alias
  – We can add index routing
  – We can add search routing


                  Copyright 2012 Sematext Int’l. All rights reserved
Server Configuration
• Solr                                     • ElasticSearch
  – Static in solrconfig.xml                      – Static in elasticsearch.yml
  – Can be reloaded                               – Properties can be
    during runtime with                             changed during runtime
    collection/core reload                          (although not all) without
                                                    reloading




                   Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Gateway Module
• Your data time machine
• Stores indices and meta data
• Currently available:
  – Local
  – Shared FS
  – Hadoop
  – S3


                Copyright 2012 Sematext Int’l. All rights reserved
Discovery
• Apache Solr uses ZooKeeper
• ElasticSearch uses Zen Discovery




              Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Zen Discovery
• Allows automatic node discovery
• Provides multicast and unicast discovery
  methods
• Automatic master detection
• Two - way failure detection




               Copyright 2012 Sematext Int’l. All rights reserved
Apache Solr & Apache ZooKeeper
• Requires additional software
• ZooKeeper ensemble with 1+ ZooKeeper
  instances
• Prevents split – brain situations
• Holds collections configurations
• Solr needs to know address of one of the
  ZooKeeper instances


              Copyright 2012 Sematext Int’l. All rights reserved
API
• HTTP REST API in ElasticSearch or Query String
  for simple queries
• HTTP with Query String in Apache Solr
• Both provide specialized Java API
  – SolrJ for Apache Solr and CloudSolrServer
  – ElasticSearch with TransportClient for remote
    connections



                Copyright 2012 Sematext Int’l. All rights reserved
Apache Solr and Query String
• Queries are built of request parameters
• Some degree of structuring allowed (local
  params)
    curl 'http://localhost:8983/solr/select?q=text:weird&sort=date+desc'




                      Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch REST End-Points
• Simple queries built of request parameters
• Stuctured queries built as JSON objects
    curl –XGET
    'localhost:9200/sematext/test/_search/?q=_all:weird&sort=date:desc'

    curl -XGET 'localhost:9200/sematext/test_search' -d '{
    "query" : {
      "term" : {
        "_all" : "weird"
      },
      "sort" : {
        "date" : {
          "order" : "desc"
        }
      }
    }'

                         Copyright 2012 Sematext Int’l. All rights reserved
Data Handling
• Solr
  – Multiple formats allowed as input
  – Can return results in multiple formats
• ElasticSearch
  – JSON in / JSON out




                  Copyright 2012 Sematext Int’l. All rights reserved
Single or Batch
• Solr                                     • ElasticSearch
  – Single or multiple                            – Single document with a
    documents per                                   standard indexing call
    request                                       – _bulk end – point exposed
                                                    for batch indexing
                                                  – _bulk UDP end – point can
                                                    be exposed for low
                                                    latency batch indexing


                   Copyright 2012 Sematext Int’l. All rights reserved
Partial Document Updates
• Not based on LUCENE-3837 proposed by
  Andrzej Białecki
• Document reindexing on the side of search
  server
• Both servers use versioning to prevent
  changes being overwritten
• Can lead to decreased network traffic in some
  cases

              Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Partial Doc Update
• Special end – point exposed - _update
• Supports parameters like
  routing, parent, replication, percolate, etc
  (similar to Index API)
• Uses scripts to perform document updates
   curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{
     "script" : "ctx._source.enabled = enabled",
     "params" : {
       "enabled" : true
     }
   }'

                       Copyright 2012 Sematext Int’l. All rights reserved
Apache Solr Partial Doc Update
• Sent to the standard update handler
• Requires _version_ field to be present

    curl 'localhost:8983/solr/update?commit=true' -H 'Content-
    type:application/json' -d '[
      {
        "id" : "12345",
        "enabled" : {
          "set" : true
        }
      }
    ]'


                      Copyright 2012 Sematext Int’l. All rights reserved
Solr Collections API
• Built on top of Core Admin
• Allows:
  – Collection creation
  – Collection reload
  – Collection deletion




                Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Indices REST API
• Allows:
  – Index creation
  – Index deletion
  – Index closing and opening
  – Index refreshing
  – Existence checking




               Copyright 2012 Sematext Int’l. All rights reserved
Analysis Chain Definition
• Solr                                    • ElasticSearch
  – Static in schema.xml                         – Static in elasticsearch.yml
  – Can be reloaded                              – Defined during index/type
    during runtime with                            creation with REST call
    collection/core reload                       – Possible to change with
                                                   update mapping call (not
                                                   all changes allowed)




                  Copyright 2012 Sematext Int’l. All rights reserved
Multilingual Data Handling
• Both ElasticSearch and Apache Solr built on
  top of Apache Lucene
• Solr – analyzers defined per field in schema.xml
  file
• ElasticSearch – analyzer defined in mappings,
  but can be set during query or specified on
  the basis of field values


               Copyright 2012 Sematext Int’l. All rights reserved
Results Grouping
• Available in Apache Solr only
• Allows for results grouping based on:
  – Field value
  – Query
  – Function query (not available during distributed
    searching)




                Copyright 2012 Sematext Int’l. All rights reserved
Prospective Search
• Allows for checking if a document matches a
  stored query
• Not available in Apache Solr
• Available in ElasticSearch under the name of
  Percolator




               Copyright 2012 Sematext Int’l. All rights reserved
Spellchecker
• Allows to check and correct spelling mistakes
• Not available in ElasticSearch currently
• Multiple implementations available in Apache
  Solr
  – IndexBasedSpellChecker
  – WordBreakSolrSpellChecker
  – DirectSolrSpellChecker


              Copyright 2012 Sematext Int’l. All rights reserved
Full Text Search Capabilities
•   Variety of queries
•   Ability to control score calculation
•   Different query parsers available
•   Advanced Lucene queries (like SpanQueries)
    exposed




                Copyright 2012 Sematext Int’l. All rights reserved
Score Calculation
•   Leverage Lucene scoring capabilities
•   Control over document importance
•   Control over query importance
•   Control over term and phrase importance




                Copyright 2012 Sematext Int’l. All rights reserved
Apache Solr and Score Influence
• Index time
  – Document boosts
  – Field boosts
• Query time
  – Term boosts
  – Field boosts
  – Phrases boost
  – Function queries

               Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch and Score Influence
• Index time
  – Document and field boosts
• Query time
  – Different queries provide different boost controls
  – Can calculate distributed term frequencies
  – Negative and Positive boosting queries
  – Custom score filters
• Scripts
  – Control scoring with scripts

                Copyright 2012 Sematext Int’l. All rights reserved
Nested Objects
• Possible only in ElasticSearch
• Indexed as separate documents
• Stored in the same part of the index as the
  root document
• Hidden from standard queries and filters
• Need appropriate queries and filters (nested)



               Copyright 2012 Sematext Int’l. All rights reserved
More Like This
• Lets us find similar documents
• Solr
  – More Like This Component
• ElasticSearch
  – More Like This Query
  – More Like This Field Query
  – _mlt REST end – point



                  Copyright 2012 Sematext Int’l. All rights reserved
Solr Parent – Child Relationship
• Used at query time
• Multi core joins possible
      http://localhost:8983/solr/select?q={!join from=parent to=id}color:Yellow




                         Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Parent – Child Handling
• Proper indexing required
• Indexed as separate documents
• Standard queries don’t return child
  documents
• In order to retrieve parent docs one should
  use appropriate queries and filters (has_child,
  has_parent, top_children)



               Copyright 2012 Sematext Int’l. All rights reserved
Filters
•   Used to narrown down query results
•   Good candidates for caching and reuse
•   Supported by ElasticSearch and Apache Solr
•   Should be used for repeatable query elements




                Copyright 2012 Sematext Int’l. All rights reserved
Apache Solr Filter Queries
•   Multiple filters per query
•   Filters are addictive
•   Different query parsers can be used
•   Local params can be used
•   Narrow down faceting results




                Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Filtered Queries
• Can be defined using queries exposed by the
  Query DSL
• Can be used for custom score calculation (i.e.,
  custom filters score query)
• Doesn’t narrow down faceting results by
  default (facets have their own filters)



               Copyright 2012 Sematext Int’l. All rights reserved
Filter Cache Control
• Both Solr and ElasticSearch let us control
  cache for filters
• Solr
  – Using local params and cache property
• ElasticSearch
  – _cache   property
  –   _cache_key property




                  Copyright 2012 Sematext Int’l. All rights reserved
Faceting
• Both provide common facets
  – Terms
  – Range & query
  – Terms statistics
  – Spatial distance
• Solr
  – Pivot faceting
• ElasticSearch
  – Histograms

                  Copyright 2012 Sematext Int’l. All rights reserved
Real Time Or Not ?
• Allow getting document not yet indexed
• Don’t need searcher reopening
• ElasticSearch
  – Separate Get and Multi Get API’s
• Apache Solr
  – Separate Realtime Get Handler
  – Can be used as a search component


                Copyright 2012 Sematext Int’l. All rights reserved
Caches and Warming
• ElasticSearch and Solr allow caching
• Both allow running warming queries
• ElasticSearch by default doesn’t limit cache
  sizes




               Copyright 2012 Sematext Int’l. All rights reserved
Solr Caches
• Types
   – Filter Cache
   – Query Result Cache
   – Document Cache
• Implementation choices
   – LRUCache
   – FastLRUCache
   – LFUCache
• Other configuration options:
   – Size
   – Maximum size
   – Autowarming count


                    Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Caches
• Types
  – Filter Cache
  – Field Data Cache
• Implementation choices
  – Resident
  – Soft
  – Weak
• Other configuration options:
  – Max size (entries per segment)
  – Expiration time

                Copyright 2012 Sematext Int’l. All rights reserved
Cluster State Monitoring
• Apache Solr – multiple mbeans exposed by
  JMX
• ElasticSearch – multiple REST end – points
  exposed to get different statistics




              Copyright 2012 Sematext Int’l. All rights reserved
ElasticSearch Statistics API
•   Health and State Check
•   Nodes Information and Statistics
•   Cache Statistics
•   Index Segments Information
•   Index Information and Statistics
•   Mappings Information



                Copyright 2012 Sematext Int’l. All rights reserved
Cluster Monitoring




  Copyright 2012 Sematext Int’l. All rights reserved
Cluster Monitoring with SPM




       Copyright 2012 Sematext Int’l. All rights reserved
Cluster Settings Update
• ElasticSearch lets us:
  – Control rebalancing
  – Control recovery
  – Control allocation
  – Change the above on the live cluster




                Copyright 2012 Sematext Int’l. All rights reserved
Custom Shard Allocation
• Possible in ElasticSearch
• Cluster level:
  curl -XPUT localhost:9200/_cluster/settings -d '{
     "persistent" : {
       "cluster.routing.allocation.exclude._ip" : "192.168.2.1"
     }
  }'

• Index level:
  curl -XPUT localhost:9200/sematext/ -d '{
     "index.routing.allocation.include.tag" : "nodeOne,nodeTwo"
  }'




                               Copyright 2012 Sematext Int’l. All rights reserved
Moving Shards and Replicas
• Possible in ElasticSearch, not available in Solr
• Allows to move shards and replicas to any
  node in the cluster on demand
• Available in ElasticSearch:
  curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
   "commands" : [
    {"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}},
    {"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}}
   ]
  }'




                              Copyright 2012 Sematext Int’l. All rights reserved
And The Winner Is ?




   Copyright 2012 Sematext Int’l. All rights reserved
How to Reach Us
• Rafał Kuć
  – Twitter: @kucrafal
  – E-mail: rafal.kuc@sematext.com
• Sematext
  – Twitter: @sematext
  – Website: http://sematext.com
• Solr vs ElasticSearch series:
  • http://blog.sematext.com/2012/08/23/solr-vs-
    elasticsearch-part-1-overview/

                 Copyright 2012 Sematext Int’l. All rights reserved
We Are Hiring !
•   Dig Search ?
•   Dig Analytics ?
•   Dig Big Data ?
•   Dig Performance ?
•   Dig working with and in open – source ?
•   We’re hiring world – wide !
       http://sematext.com/about/jobs.html

                Copyright 2012 Sematext Int’l. All rights reserved

Mais conteúdo relacionado

Mais procurados

Node.js und die Oracle-Datenbank
Node.js und die Oracle-DatenbankNode.js und die Oracle-Datenbank
Node.js und die Oracle-DatenbankCarsten Czarski
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr Tommaso Teofili
 
eZ Find workshop: advanced insights & recipes
eZ Find workshop: advanced insights & recipeseZ Find workshop: advanced insights & recipes
eZ Find workshop: advanced insights & recipesPaul Borgermans
 
Taking eZ Find beyond full-text search
Taking eZ Find beyond  full-text searchTaking eZ Find beyond  full-text search
Taking eZ Find beyond full-text searchPaul Borgermans
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!Paul Borgermans
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4thelabdude
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning ElasticsearchAnurag Patel
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
How SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded EnvironmentHow SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded Environmentlucenerevolution
 

Mais procurados (20)

Oak / Solr integration
Oak / Solr integrationOak / Solr integration
Oak / Solr integration
 
Node.js und die Oracle-Datenbank
Node.js und die Oracle-DatenbankNode.js und die Oracle-Datenbank
Node.js und die Oracle-Datenbank
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr
 
eZ Find workshop: advanced insights & recipes
eZ Find workshop: advanced insights & recipeseZ Find workshop: advanced insights & recipes
eZ Find workshop: advanced insights & recipes
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
Taking eZ Find beyond full-text search
Taking eZ Find beyond  full-text searchTaking eZ Find beyond  full-text search
Taking eZ Find beyond full-text search
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
How SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded EnvironmentHow SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded Environment
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 

Semelhante a BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch

Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextRafał Kuć
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1Stefan Schmidt
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunktdthomassld
 
ApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataOpenSource Connections
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
hibernateormfeatures-140223193044-phpapp02.pdf
hibernateormfeatures-140223193044-phpapp02.pdfhibernateormfeatures-140223193044-phpapp02.pdf
hibernateormfeatures-140223193044-phpapp02.pdfPatiento Del Mar
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesAnant Corporation
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr PerformanceLucidworks
 
Not Just ORM: Powerful Hibernate ORM Features and Capabilities
Not Just ORM: Powerful Hibernate ORM Features and CapabilitiesNot Just ORM: Powerful Hibernate ORM Features and Capabilities
Not Just ORM: Powerful Hibernate ORM Features and CapabilitiesBrett Meyer
 

Semelhante a BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch (20)

Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
Solr
SolrSolr
Solr
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunk
 
ApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big Data
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Apache solr
Apache solrApache solr
Apache solr
 
hibernateormfeatures-140223193044-phpapp02.pdf
hibernateormfeatures-140223193044-phpapp02.pdfhibernateormfeatures-140223193044-phpapp02.pdf
hibernateormfeatures-140223193044-phpapp02.pdf
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Big Search with Big Data Principles
Big Search with Big Data PrinciplesBig Search with Big Data Principles
Big Search with Big Data Principles
 
Not Just ORM: Powerful Hibernate ORM Features and Capabilities
Not Just ORM: Powerful Hibernate ORM Features and CapabilitiesNot Just ORM: Powerful Hibernate ORM Features and Capabilities
Not Just ORM: Powerful Hibernate ORM Features and Capabilities
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 

Último

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Último (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch

  • 1. Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć – Sematext International @kucrafal @sematext sematext.com
  • 2. Who Am I • „Solr 3.1 Cookbook” author (4.0 inc) • Sematext consultant & engineer • Solr.pl co-founder • Father and husband  Copyright 2012 Sematext Int’l. All rights reserved
  • 3. What Will I Talk About ? Copyright 2012 Sematext Int’l. All rights reserved
  • 4. Under the Hood • ElasticSearch 0.20 – Apache Lucene 3.6.1 • Apache Solr 4.0 – Apache Lucene 4.0 Copyright 2012 Sematext Int’l. All rights reserved
  • 5. Architecture • What we expect – Scalability – Fault toleranance – High availablity – Features • What we are also looking for – Manageability – Installation ease – Tools Copyright 2012 Sematext Int’l. All rights reserved
  • 6. ElasticSearch Cluster Architecture • Distributed • Fault tolerant • Only ElasticSearch nodes • Single leader • Automatic leader election Copyright 2012 Sematext Int’l. All rights reserved
  • 7. SolrCloud Cluster Architecture • Distributed • Fault tolerant • Apache Solr + ZooKeeper ensemble • Leader per shard • Automatic leader election Copyright 2012 Sematext Int’l. All rights reserved
  • 8. Collection vs Index • Collection – Solr main logical index • Index – ElasticSearch main logic structure • Collections and Indices can be spread among different nodes in the cluster Copyright 2012 Sematext Int’l. All rights reserved
  • 9. Multiple Document Types in Index • ElasticSearch - multiple document types in a single index • Apache Solr - multiple document types in a single collection – shared schema.xml Copyright 2012 Sematext Int’l. All rights reserved
  • 10. Shards and Replicas • Index / Collection can have many shards • Each shard can have 0 or more replicas • Replicas are automatically updated • Replicas can be promoted to leaders when a leader shard goes off-line Copyright 2012 Sematext Int’l. All rights reserved
  • 11. Index and Query Routing • Control where documents are going • Control where queries are going • Manual data distribution Copyright 2012 Sematext Int’l. All rights reserved
  • 12. Querying Without Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Collection / Index Application Copyright 2012 Sematext Int’l. All rights reserved
  • 13. Query With Routing Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 Shard 7 Shard 8 Collection / Index Application Copyright 2012 Sematext Int’l. All rights reserved
  • 14. Routing Docs and Queries in Solr • Requires some effort • Defaults to hash based on document identifiers • Can be turned off using solr.NoOpDistributingUpdateProcessorFactory <updateRequestProcessorChain> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> <processor class="solr.NoOpDistributingUpdateProcessorFactory" /> </updateRequestProcessorChain> Copyright 2012 Sematext Int’l. All rights reserved
  • 15. Routing Docs and Queries - ElasticSearch • routing parameter controls target shard which document/query will be forwarded to • defaults to document identifiers • can be changed to any value curl -XPUT localhost:9200/sematext/test/1?routing=1234 -d '{ "title" : "Test routing document" }' curl –XGET localhost:9200/sematext/test/_search/?q=*&routing=1234 Copyright 2012 Sematext Int’l. All rights reserved
  • 16. Apache Solr Index Structure • Field types defined in schema.xml file • Fields defined in schema.xml file • Allows automatic value copying • Allows dynamic fields • Allows custom similarity definition Copyright 2012 Sematext Int’l. All rights reserved
  • 17. ElasticSearch Index Structure • Schema - less • Analyzers and filters defined with HTTP API • Fields defined with an HTTP request • Multi – field support • Allows nested documents • Allows parent – child relationship • Allows structured data Copyright 2012 Sematext Int’l. All rights reserved
  • 18. Index Structure Manipulation • Possible to some extent in Solr as well as ElasticSearch • ElasticSearch allows dynamic mappings update (not always) Copyright 2012 Sematext Int’l. All rights reserved
  • 19. Aliasing • Solr – Allows core aliasing • ElasticSearch – Allows index aliasing – We can add filter to alias – We can add index routing – We can add search routing Copyright 2012 Sematext Int’l. All rights reserved
  • 20. Server Configuration • Solr • ElasticSearch – Static in solrconfig.xml – Static in elasticsearch.yml – Can be reloaded – Properties can be during runtime with changed during runtime collection/core reload (although not all) without reloading Copyright 2012 Sematext Int’l. All rights reserved
  • 21. ElasticSearch Gateway Module • Your data time machine • Stores indices and meta data • Currently available: – Local – Shared FS – Hadoop – S3 Copyright 2012 Sematext Int’l. All rights reserved
  • 22. Discovery • Apache Solr uses ZooKeeper • ElasticSearch uses Zen Discovery Copyright 2012 Sematext Int’l. All rights reserved
  • 23. ElasticSearch Zen Discovery • Allows automatic node discovery • Provides multicast and unicast discovery methods • Automatic master detection • Two - way failure detection Copyright 2012 Sematext Int’l. All rights reserved
  • 24. Apache Solr & Apache ZooKeeper • Requires additional software • ZooKeeper ensemble with 1+ ZooKeeper instances • Prevents split – brain situations • Holds collections configurations • Solr needs to know address of one of the ZooKeeper instances Copyright 2012 Sematext Int’l. All rights reserved
  • 25. API • HTTP REST API in ElasticSearch or Query String for simple queries • HTTP with Query String in Apache Solr • Both provide specialized Java API – SolrJ for Apache Solr and CloudSolrServer – ElasticSearch with TransportClient for remote connections Copyright 2012 Sematext Int’l. All rights reserved
  • 26. Apache Solr and Query String • Queries are built of request parameters • Some degree of structuring allowed (local params) curl 'http://localhost:8983/solr/select?q=text:weird&sort=date+desc' Copyright 2012 Sematext Int’l. All rights reserved
  • 27. ElasticSearch REST End-Points • Simple queries built of request parameters • Stuctured queries built as JSON objects curl –XGET 'localhost:9200/sematext/test/_search/?q=_all:weird&sort=date:desc' curl -XGET 'localhost:9200/sematext/test_search' -d '{ "query" : { "term" : { "_all" : "weird" }, "sort" : { "date" : { "order" : "desc" } } }' Copyright 2012 Sematext Int’l. All rights reserved
  • 28. Data Handling • Solr – Multiple formats allowed as input – Can return results in multiple formats • ElasticSearch – JSON in / JSON out Copyright 2012 Sematext Int’l. All rights reserved
  • 29. Single or Batch • Solr • ElasticSearch – Single or multiple – Single document with a documents per standard indexing call request – _bulk end – point exposed for batch indexing – _bulk UDP end – point can be exposed for low latency batch indexing Copyright 2012 Sematext Int’l. All rights reserved
  • 30. Partial Document Updates • Not based on LUCENE-3837 proposed by Andrzej Białecki • Document reindexing on the side of search server • Both servers use versioning to prevent changes being overwritten • Can lead to decreased network traffic in some cases Copyright 2012 Sematext Int’l. All rights reserved
  • 31. ElasticSearch Partial Doc Update • Special end – point exposed - _update • Supports parameters like routing, parent, replication, percolate, etc (similar to Index API) • Uses scripts to perform document updates curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{ "script" : "ctx._source.enabled = enabled", "params" : { "enabled" : true } }' Copyright 2012 Sematext Int’l. All rights reserved
  • 32. Apache Solr Partial Doc Update • Sent to the standard update handler • Requires _version_ field to be present curl 'localhost:8983/solr/update?commit=true' -H 'Content- type:application/json' -d '[ { "id" : "12345", "enabled" : { "set" : true } } ]' Copyright 2012 Sematext Int’l. All rights reserved
  • 33. Solr Collections API • Built on top of Core Admin • Allows: – Collection creation – Collection reload – Collection deletion Copyright 2012 Sematext Int’l. All rights reserved
  • 34. ElasticSearch Indices REST API • Allows: – Index creation – Index deletion – Index closing and opening – Index refreshing – Existence checking Copyright 2012 Sematext Int’l. All rights reserved
  • 35. Analysis Chain Definition • Solr • ElasticSearch – Static in schema.xml – Static in elasticsearch.yml – Can be reloaded – Defined during index/type during runtime with creation with REST call collection/core reload – Possible to change with update mapping call (not all changes allowed) Copyright 2012 Sematext Int’l. All rights reserved
  • 36. Multilingual Data Handling • Both ElasticSearch and Apache Solr built on top of Apache Lucene • Solr – analyzers defined per field in schema.xml file • ElasticSearch – analyzer defined in mappings, but can be set during query or specified on the basis of field values Copyright 2012 Sematext Int’l. All rights reserved
  • 37. Results Grouping • Available in Apache Solr only • Allows for results grouping based on: – Field value – Query – Function query (not available during distributed searching) Copyright 2012 Sematext Int’l. All rights reserved
  • 38. Prospective Search • Allows for checking if a document matches a stored query • Not available in Apache Solr • Available in ElasticSearch under the name of Percolator Copyright 2012 Sematext Int’l. All rights reserved
  • 39. Spellchecker • Allows to check and correct spelling mistakes • Not available in ElasticSearch currently • Multiple implementations available in Apache Solr – IndexBasedSpellChecker – WordBreakSolrSpellChecker – DirectSolrSpellChecker Copyright 2012 Sematext Int’l. All rights reserved
  • 40. Full Text Search Capabilities • Variety of queries • Ability to control score calculation • Different query parsers available • Advanced Lucene queries (like SpanQueries) exposed Copyright 2012 Sematext Int’l. All rights reserved
  • 41. Score Calculation • Leverage Lucene scoring capabilities • Control over document importance • Control over query importance • Control over term and phrase importance Copyright 2012 Sematext Int’l. All rights reserved
  • 42. Apache Solr and Score Influence • Index time – Document boosts – Field boosts • Query time – Term boosts – Field boosts – Phrases boost – Function queries Copyright 2012 Sematext Int’l. All rights reserved
  • 43. ElasticSearch and Score Influence • Index time – Document and field boosts • Query time – Different queries provide different boost controls – Can calculate distributed term frequencies – Negative and Positive boosting queries – Custom score filters • Scripts – Control scoring with scripts Copyright 2012 Sematext Int’l. All rights reserved
  • 44. Nested Objects • Possible only in ElasticSearch • Indexed as separate documents • Stored in the same part of the index as the root document • Hidden from standard queries and filters • Need appropriate queries and filters (nested) Copyright 2012 Sematext Int’l. All rights reserved
  • 45. More Like This • Lets us find similar documents • Solr – More Like This Component • ElasticSearch – More Like This Query – More Like This Field Query – _mlt REST end – point Copyright 2012 Sematext Int’l. All rights reserved
  • 46. Solr Parent – Child Relationship • Used at query time • Multi core joins possible http://localhost:8983/solr/select?q={!join from=parent to=id}color:Yellow Copyright 2012 Sematext Int’l. All rights reserved
  • 47. ElasticSearch Parent – Child Handling • Proper indexing required • Indexed as separate documents • Standard queries don’t return child documents • In order to retrieve parent docs one should use appropriate queries and filters (has_child, has_parent, top_children) Copyright 2012 Sematext Int’l. All rights reserved
  • 48. Filters • Used to narrown down query results • Good candidates for caching and reuse • Supported by ElasticSearch and Apache Solr • Should be used for repeatable query elements Copyright 2012 Sematext Int’l. All rights reserved
  • 49. Apache Solr Filter Queries • Multiple filters per query • Filters are addictive • Different query parsers can be used • Local params can be used • Narrow down faceting results Copyright 2012 Sematext Int’l. All rights reserved
  • 50. ElasticSearch Filtered Queries • Can be defined using queries exposed by the Query DSL • Can be used for custom score calculation (i.e., custom filters score query) • Doesn’t narrow down faceting results by default (facets have their own filters) Copyright 2012 Sematext Int’l. All rights reserved
  • 51. Filter Cache Control • Both Solr and ElasticSearch let us control cache for filters • Solr – Using local params and cache property • ElasticSearch – _cache property – _cache_key property Copyright 2012 Sematext Int’l. All rights reserved
  • 52. Faceting • Both provide common facets – Terms – Range & query – Terms statistics – Spatial distance • Solr – Pivot faceting • ElasticSearch – Histograms Copyright 2012 Sematext Int’l. All rights reserved
  • 53. Real Time Or Not ? • Allow getting document not yet indexed • Don’t need searcher reopening • ElasticSearch – Separate Get and Multi Get API’s • Apache Solr – Separate Realtime Get Handler – Can be used as a search component Copyright 2012 Sematext Int’l. All rights reserved
  • 54. Caches and Warming • ElasticSearch and Solr allow caching • Both allow running warming queries • ElasticSearch by default doesn’t limit cache sizes Copyright 2012 Sematext Int’l. All rights reserved
  • 55. Solr Caches • Types – Filter Cache – Query Result Cache – Document Cache • Implementation choices – LRUCache – FastLRUCache – LFUCache • Other configuration options: – Size – Maximum size – Autowarming count Copyright 2012 Sematext Int’l. All rights reserved
  • 56. ElasticSearch Caches • Types – Filter Cache – Field Data Cache • Implementation choices – Resident – Soft – Weak • Other configuration options: – Max size (entries per segment) – Expiration time Copyright 2012 Sematext Int’l. All rights reserved
  • 57. Cluster State Monitoring • Apache Solr – multiple mbeans exposed by JMX • ElasticSearch – multiple REST end – points exposed to get different statistics Copyright 2012 Sematext Int’l. All rights reserved
  • 58. ElasticSearch Statistics API • Health and State Check • Nodes Information and Statistics • Cache Statistics • Index Segments Information • Index Information and Statistics • Mappings Information Copyright 2012 Sematext Int’l. All rights reserved
  • 59. Cluster Monitoring Copyright 2012 Sematext Int’l. All rights reserved
  • 60. Cluster Monitoring with SPM Copyright 2012 Sematext Int’l. All rights reserved
  • 61. Cluster Settings Update • ElasticSearch lets us: – Control rebalancing – Control recovery – Control allocation – Change the above on the live cluster Copyright 2012 Sematext Int’l. All rights reserved
  • 62. Custom Shard Allocation • Possible in ElasticSearch • Cluster level: curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.exclude._ip" : "192.168.2.1" } }' • Index level: curl -XPUT localhost:9200/sematext/ -d '{ "index.routing.allocation.include.tag" : "nodeOne,nodeTwo" }' Copyright 2012 Sematext Int’l. All rights reserved
  • 63. Moving Shards and Replicas • Possible in ElasticSearch, not available in Solr • Allows to move shards and replicas to any node in the cluster on demand • Available in ElasticSearch: curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ {"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}}, {"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}} ] }' Copyright 2012 Sematext Int’l. All rights reserved
  • 64. And The Winner Is ? Copyright 2012 Sematext Int’l. All rights reserved
  • 65. How to Reach Us • Rafał Kuć – Twitter: @kucrafal – E-mail: rafal.kuc@sematext.com • Sematext – Twitter: @sematext – Website: http://sematext.com • Solr vs ElasticSearch series: • http://blog.sematext.com/2012/08/23/solr-vs- elasticsearch-part-1-overview/ Copyright 2012 Sematext Int’l. All rights reserved
  • 66. We Are Hiring ! • Dig Search ? • Dig Analytics ? • Dig Big Data ? • Dig Performance ? • Dig working with and in open – source ? • We’re hiring world – wide ! http://sematext.com/about/jobs.html Copyright 2012 Sematext Int’l. All rights reserved