3. .
.
.
what is it?
...
.
Elasticsearch is a flexible and powerful
open source, distributed, real-time search
and analytics engine.
elasticsearch.org/overview/
4. .
.
.
talk disclaimers
• introduction to ES (sorry, no heavy stuff)
• focused on Elasticsearch itself (not so much
on integration with Kibana, Logstash, etc)
• heavily based on Andrew Cholakian’s book
Exploring Elasticsearch
• Tiririca method
• not all disclaimers have necessarily been
disclaimed
6. .
.
.
buzzword driven slide
• real time analytics
• conflict management
• per-operation
persistence
• document oriented
• build on top of
Apache Lucene™
• Apache 2 Open
Source License
• real time data
• distributed
• multi-tenancy
• RESTful API
• schema free
• full text search
• high availability
8. .
.
.
use cases
...
.
search a large number of product descriptions for
a specific phrase and return the best results
...
.search for words that sound like a given word
9. .
.
.
use cases
...
.
search a large number of product descriptions for
a specific phrase and return the best results
...
.search for words that sound like a given word
...
.
auto-complete a search box with previously search
issues and allowing misspellings
10. .
.
.
use cases
...
.
search a large number of product descriptions for
a specific phrase and return the best results
...
.search for words that sound like a given word
...
.
auto-complete a search box with previously search
issues and allowing misspellings
...
.
storing large quantities of semi-structured (JSON)
data in a distributed fashion, with redundancy
13. .
.
.
don’t use cases
...
.calculate how many items are le in an inventory
...
.
figure out the sum of all items in a given month’s
invoices
...
.
execute operations transactionally with rollback
support
14. .
.
.
don’t use cases
...
.calculate how many items are le in an inventory
...
.
figure out the sum of all items in a given month’s
invoices
...
.
execute operations transactionally with rollback
support
...
.guarantee item uniqueness across multiple fields
15. .
.
.
history
2004: Shay Bannon creates Compass (Java
search engine framework)
2009: big parts of Compass would need to
be rewritten to release a third version
focused on scalability
Feb 2010: Elasticsearch 0.4.0
Mar 2012: Elasticsearch 0.19.0
Apr 2013: Elasticsearch 0.90.0
Feb 2014: Elasticsearch 1.0.0
Mar 2014: Elasticsearch 1.1.0
17. .
.
.
JSON over HTTP
• primary data format for ES is JSON
• main protocol consists of HTTP requests with
JSON payload
• _id is unique, and generated automatically if
unassigned
• internally, JSON is converted flat fields for
Lucene’s key/value API
19. .
.
.
documents
• like a row in a table in an RDB
• JSON objects
• each is stored in an index, has a type and an
id
• each contains zero or more fields
20. .
.
.
sample document
.
PUT /music/songs/1
..
.
{
”_id” : 1,
”title” : ”The Vampyre of Time and Memory”,
”author” : ”Queens of the Stone Age”,
”album” : {
”title” : ”...Like Clockwork”,
”year” : 2013,
”track” : 3,
},
”genres” : [”alternative rock”,”piano rock”]
}
22. .
.
.
types
type definition
string text
integer 32-bit integers
long 64-bit integers
float IEEE floats
double double precision floats
boolean true or false
date UTC Date/Time
geo_point latitude/longitude
null the value null
array any field
object type ommited, properties field
nested separate document
23. .
.
.
mapping
• defines the types of a document’s fields
• and the way they are indexed
• scopes _ids (documents with different types
may have identical _ids)
• defines a bunch of index-wide settings
• can be defined explicitly or automatically
when a document is indexed
25. .
.
.
indexes
• like a database in an RDB
• has a mapping which defines types
• logical namespace
• maps to one or more primary shards
• can have zero or more replica shards
30. .
.
.
ES Search API
Includes:
• Query DSL
• Filter API
• Facet API
• Sort API
• …
...
.
• /index/_search
• /index/type/_search
31. .
.
.
filters
filtered queries: nested in the query field; affect
both query results and facet counts
top-level filters: specified at the root of search,
will only affect queries
facet level filters: pre-filters data before being
aggregated, only affects one specific
facet
34. .
.
.
analysis
• performed when documents are added
• manipulates data to ensure better indexing
• 3 steps:
1. character filtering
2. tokenization
3. token filtering
• distinct analyzers for each field
• multiple analyzers for each field
• custom analyzers
39. .
.
.
aggregations
Unit of work that builds analytic information over a
set of documents
.
bucketing..
.
Documents are evaluated and placed into buckets
according to previously defined criteria
.
metric..
.
Keep track of metrics which are computed over a
set of documents