Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch
1. Battle of the Giants
Rafał Kuć – Sematext Group, Inc.
@kucrafal @sematext sematext.com
2. Ich bin ein…
Sematext consultant & engineer
Solr Cookbook series author
„ElasticSearch Server” author
„Mastering ElasticSearch” author
Solr.pl co-founder
Father and husband
Copyright 2013 Sematext Group. Inc. All rights reserved
6. Expectations vs Reality
Only ElasticSearch nodes
Single leader
Copyright 2013 Sematext Group. Inc. All rights reserved
Solr + ZooKeeper
Leader per shard
Distributed
Fault tolerant
Automatic leader election
7. All Time Top Committers
Copyright 2013 Sematext Group. Inc. All rights reserved
12. Collection vs Index
Collections and Indices can be spread among
different nodes in the cluster
Copyright 2013 Sematext Group. Inc. All rights reserved
Collection – main
logical index
Index – main
logical structure
13. Apache Solr Index Structure
Field and types defined in schema
Automatic value copying
Dynamic fields
Custom similarity
Custom postings format
Multiple document types require shared schema
Can be read using API
Copyright 2013 Sematext Group. Inc. All rights reserved
14. ElasticSearch Index Structure
Schema - less
Fields and types defined with HTTP API
Multi – field support
Nested and parent – child documents
Custom similarity
Custom postings format
Multiple document with different structure
Can be read and written using API
Copyright 2013 Sematext Group. Inc. All rights reserved
15. Shards and Replicas
Many shards
0 or more replicas
Replica can become leader
Replicas can be created on
live cluster
Copyright 2013 Sematext Group. Inc. All rights reserved
16. Configuration
Static in solrconfig.xml
Can be reloaded with
core reload
Static in elasticsearch.yml
Changable at runtime
Copyright 2013 Sematext Group. Inc. All rights reserved
18. Solr & ZooKeeper
Requires additional software
Prevents split – brain situations
Holds collections configurations
ZooKeeper ensemble needed
Copyright 2013 Sematext Group. Inc. All rights reserved
19. ElasticSearch Zen Discovery
Automatic node discovery
Multicast and unicast discovery methods
Automatic master detection
Two - way failure detection
Copyright 2013 Sematext Group. Inc. All rights reserved
20. HTTP FTW
HTTP REST API in ElasticSearch or Query String
for simple queries
HTTP with Query String in Apache Solr
Both provide specialized Java API
Copyright 2013 Sematext Group. Inc. All rights reserved
23. Full Text Search Capabilities
Variety of queries
Control score calculation
Different query parsers
Advanced Lucene queries
Copyright 2013 Sematext Group. Inc. All rights reserved
24. Score Calculation
Leverage Lucene scoring
Control importance of:
documents
queries
terms
phrases
Similiarity configuration
Copyright 2013 Sematext Group. Inc. All rights reserved
25. Apache Solr and Score Influence
Index - time boosting
Query - time
Term boosts
Field boosts
Phrases boost
Function queries
Sub-queries used for boosting
Copyright 2013 Sematext Group. Inc. All rights reserved
26. ElasticSearch and Score Influence
Index - time
Query - time
Different queries provide different boost controls
Can calculate distributed term frequencies
Negative and Positive boosting queries
Custom score filters
Scripts
Copyright 2013 Sematext Group. Inc. All rights reserved
27. ElasticSearch Query Rescore
Reorders top N hits by using other query
Executed on shards before results are returned
to the node handling it
Not executed with scan and count
Copyright 2013 Sematext Group. Inc. All rights reserved
28. ElasticSearch Nested Objects
Indexed as separate documents
Stored in the same part of index as root doc
Hidden from standard queries and filters
Need appropriate queries and filters (nested)
Top level documents can be sorted on the basis
of nested ones
Copyright 2013 Sematext Group. Inc. All rights reserved
29. Solr Parent – Child Relationship
Used at query time
Multi core joins possible
select?q={!join from=parent to=id}color:Yellow
Copyright 2013 Sematext Group. Inc. All rights reserved
30. ElasticSearch Parent – Child
Proper indexing required
Indexed as separate documents
Standard queries don’t return child documents
Retrieve parent docs using queries and filters
(has_child, has_parent, top_children)
Copyright 2013 Sematext Group. Inc. All rights reserved
31. Filters
Used to narrown down query results
Good candidates for caching and reuse
Copyright 2013 Sematext Group. Inc. All rights reserved
Addictive
Can use different query parsers
Can use local params
Narrows down faceting results
Defined using Query DSL
Can be used for score calculation
Doesn’t narrow down faceting
results
32. Faceting
Copyright 2013 Sematext Group. Inc. All rights reserved
Terms
Range & query
Terms statistics
Spatial distance
Pivot Histograms
33. Real Time Or Not ?
Get not yet indexed docs from transaction log
Don’t need searcher reopening
Copyright 2013 Sematext Group. Inc. All rights reserved
Separate Get and
Multi Get API
Separate Realtime Get
Handler
34. Data Handling
Single and batch indexing supported
Copyright 2013 Sematext Group. Inc. All rights reserved
JSON in / JSON out
(and YAML)
Different formats allowed
(XML, JSON, CSV, binary)
35. Partial Document Updates
Not based on LUCENE-3837
Server-side doc reindexing
Both servers use versioning
Decreases network traffic
Copyright 2013 Sematext Group. Inc. All rights reserved
36. Apache Solr Partial Doc Update
Sent to the standard update handler
Requires _version_ field
curl 'localhost:8983/solr/update?commit=true' -H
'Content-type:application/json' -d '[ {
"id" : "12345",
"enabled" : {
"set" : true
}
} ]'
Copyright 2013 Sematext Group. Inc. All rights reserved
37. ElasticSearch Partial Doc Update
Special end – point exposed - _update
Supports parameters like routing, parent,
replication, percolate, etc (similar to Index API)
Uses scripts to perform document updates
curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{
"script" : "ctx._source.enabled = enabled",
"params" : {
"enabled" : true
}
}'
Copyright 2013 Sematext Group. Inc. All rights reserved
39. ElasticSearch Indices REST API
Index
creation
deletion
closing and opening
refreshing
existence checking
Copyright 2013 Sematext Group. Inc. All rights reserved
40. Apache Solr Shard Splitting
Copyright 2013 Sematext Group. Inc. All rights reserved
admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1
41. Cluster State Monitoring
Copyright 2013 Sematext Group. Inc. All rights reserved
Multiple MBeans exposed by
JMX
Multiple REST end – points
exposed to get different
statistics
42. ElasticSearch Statistics API
Health and state check
Nodes information
Cache statistics
Segments information
Index information
Mappings information
Copyright 2013 Sematext Group. Inc. All rights reserved
SPM – „One to rule them all”
43. ElasticSearch Cluster Settings Update
Control
rebalancing
recovery
allocation
Change cluster configuration properties
Copyright 2013 Sematext Group. Inc. All rights reserved
44. ElasticSearch Custom Shard Allocation
Cluster level:
Index level:
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"cluster.routing.allocation.exclude._ip" : "192.168.2.1"
}
}'
curl -XPUT localhost:9200/sematext/_settings/ -d '{
"index.routing.allocation.include.tag" : "nodeOne,nodeTwo"
}'
Copyright 2013 Sematext Group. Inc. All rights reserved
45. Moving Shards and Replicas
Move shards between nodes on demand
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [
{"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1",
"to_node" : "node2"}},
{"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}}
]
}'
Copyright 2013 Sematext Group. Inc. All rights reserved
47. And The Winner Is ?
Copyright 2013 Sematext Group. Inc. All rights reserved
48. We Are Hiring !
Dig Search ?
Dig Analytics ?
Dig Big Data ?
Dig Performance ?
Dig working with and in open – source ?
We’re hiring world – wide !
http://sematext.com/about/jobs.html
Copyright 2013 Sematext Group. Inc. All rights reserved
49. Copyright 2013 Sematext Group. Inc. All rights reserved
Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Sematext
@sematext
http://sematext.com
http://blog.sematext.com
ElasticSearch Server 25% off:
MREESS25
Thank You !