99 problems but the search aint one, confoo 2011, andrei zmievski

99 Problems, But
The Search Ain’t One
Andrei Zmievski • ConFoo •!Mar 9, 2011

who am i?
curl http://localhost:9200/speaker/info/andrei

{“name”: “Andrei Zmievski”,
“works”: “Analog Co-op”,
“projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”],
“likes”: [“coding”, “beer”, “brewing”, “photography”],
“twitter”: “@a”,
“email”: “andrei@zmievski.org”}

what is elasticsearch?

a search engine for the NoSQL generation

domain-driven

distributed

RESTful

Hitchhiker’s Guide to the Galaxy (no, really)

document model

document-oriented

JSON-based

schema-free

engine

based on Lucene

multi-tenancy

distributed, out of the box

1. index
!"#$%&'()*+%,--./00$1!2$,13-/45660!17803.92:9#0;%&<=
>
request

%%%%?72@9?/%?A7<#9B%C@B9D3:B?E
%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7==-%)79?E
%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
%%%%?-KB--9#?/%?2?E
%%%%?,9BH,-?/%;LM
N=

>
response

%%%%?1:?/-#"9
%%%%?OB7<9P?/?!178?
%%%%?O-I.9?/?3.92:9#?
%%%%?OB<?/?;?
N

2. search
request

!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

>%?-11:?%/%TE
%%?O3,2#<3?%/%>
%%%%?-1-2$?%/%;E
%%%%?3"!!9338"$?%/%;E
%%%%?82B$9<?%/%6
%%NE
%%?,B-3?%/%>
%%%%?-1-2$?%/%;E
response

%%%%?@2PO3!1#9?%/%6UV46LM64E
%%%%?,B-3?%/%G%>
%%%%%%?OB7<9P?%/%?!178?E
%%%%%%?O-I.9?%/%?3.92:9#?E
%%%%%%?OB<?%/%?;?E
%%%%%%?O3!1#9?%/%6UV46LM64E
%%%%%%?O31"#!9?%/%
>
%%%%?72@9?/%?A7<#9B%C@B9D3:B?E
%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
%%%%?-KB--9#?/%?2?E
%%%%?,9BH,-?/%;LM
N%N%J%N%N

2. search
request

!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

>%?-11:?%/%TE
%%?O3,2#<3?%/%>
%%%%?-1-2$?%/%;E
%%%%?3"!!9338"$?%/%;E
%%%%?82B$9<?%/%6
%%NE total number of hits
%%?,B-3?%/%>
!!!!"#$#%&"!'!()
response

%%%%?@2PO3!1#9?%/%6UV46LM64E
%%%%?,B-3?%/%G%>
%%%%%%?OB7<9P?%/%?!178?E
%%%%%%?O-I.9?%/%?3.92:9#?E
%%%%%%?OB<?%/%?;?E
%%%%%%?O3!1#9?%/%6UV46LM64E
%%%%%%?O31"#!9?%/%
>
%%%%?72@9?/%?A7<#9B%C@B9D3:B?E
%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
%%%%?-KB--9#?/%?2?E
%%%%?,9BH,-?/%;LM
N%N%J%N%N

2. search
request

!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

>%?-11:?%/%TE
%%?O3,2#<3?%/%>
%%%%?-1-2$?%/%;E
%%%%?3"!!9338"$?%/%;E
%%%%?82B$9<?%/%6
%%NE
%%?,B-3?%/%>
%%%%?-1-2$?%/%;E
the index of the doc
response

%%%%?@2PO3!1#9?%/%6UV46LM64E
%%%%?,B-3?%/%G%>
!!!!!!"*+,-./"!'!"0$,1")
%%%%%%?O-I.9?%/%?3.92:9#?E
%%%%%%?OB<?%/%?;?E
%%%%%%?O3!1#9?%/%6UV46LM64E
%%%%%%?O31"#!9?%/%
>
%%%%?72@9?/%?A7<#9B%C@B9D3:B?E
%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
%%%%?-KB--9#?/%?2?E
%%%%?,9BH,-?/%;LM
N%N%J%N%N

2. search
request

!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

>%?-11:?%/%TE
%%?O3,2#<3?%/%>
%%%%?-1-2$?%/%;E
%%%%?3"!!9338"$?%/%;E
%%%%?82B$9<?%/%6
%%NE
%%?,B-3?%/%>
%%%%?-1-2$?%/%;E
response

%%%%?@2PO3!1#9?%/%6UV46LM64E
%%%%?,B-3?%/%G%> the type of the doc
%%%%%%?OB7<9P?%/%?!178?E
!!!!!!"*#23."!'!"43.%5.6")
%%%%%%?OB<?%/%?;?E
%%%%%%?O3!1#9?%/%6UV46LM64E
%%%%%%?O31"#!9?%/%
>
%%%%?72@9?/%?A7<#9B%C@B9D3:B?E
%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
%%%%?-KB--9#?/%?2?E
%%%%?,9BH,-?/%;LM
N%N%J%N%N

2. search
request

!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

>%?-11:?%/%TE
%%?O3,2#<3?%/%>
%%%%?-1-2$?%/%;E
%%%%?3"!!9338"$?%/%;E
%%%%?82B$9<?%/%6
%%NE
%%?,B-3?%/%>
%%%%?-1-2$?%/%;E
response

%%%%?@2PO3!1#9?%/%6UV46LM64E
%%%%?,B-3?%/%G%>
%%%%%%?OB7<9P?%/%?!178?E
%%%%%%?O-I.9?%/%?3.92:9#?E
!!!!!!"*+-"!'!"(") the id of the doc
%%%%%%?O3!1#9?%/%6UV46LM64E
%%%%%%?O31"#!9?%/%
>
%%%%?72@9?/%?A7<#9B%C@B9D3:B?E
%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
%%%%?-KB--9#?/%?2?E
%%%%?,9BH,-?/%;LM
N%N%J%N%N

2. search
request

!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

>%?-11:?%/%TE
%%?O3,2#<3?%/%>
%%%%?-1-2$?%/%;E
%%%%?3"!!9338"$?%/%;E
%%%%?82B$9<?%/%6
%%NE
%%?,B-3?%/%>
%%%%?-1-2$?%/%;E
response

%%%%?@2PO3!1#9?%/%6UV46LM64E
%%%%?,B-3?%/%G%>
%%%%%%?OB7<9P?%/%?!178?E
%%%%%%?O-I.9?%/%?3.92:9#?E
!!!!!!"*+-"!'!"(")
%%%%%%?O3!1#9?%/%6UV46LM64E the hit score
%%%%%%?O31"#!9?%/%
>
%%%%?72@9?/%?A7<#9B%C@B9D3:B?E
%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
%%%%?-KB--9#?/%?2?E
%%%%?,9BH,-?/%;LM
N%N%J%N%N

2. search
request

!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

>%?-11:?%/%TE
%%?O3,2#<3?%/%>
%%%%?-1-2$?%/%;E
%%%%?3"!!9338"$?%/%;E
%%%%?82B$9<?%/%6
%%NE
%%?,B-3?%/%>
%%%%?-1-2$?%/%;E
response

%%%%?@2PO3!1#9?%/%6UV46LM64E
%%%%?,B-3?%/%G%>
%%%%%%?OB7<9P?%/%?!178?E
%%%%%%?O-I.9?%/%?3.92:9#?E
!!!!!!"*+-"!'!"(")
%%%%%%?O3!1#9?%/%6UV46LM64E
%%%%%%?O31"#!9?%/%
the original doc contents
7
!!!!",%8."'!"9,-6.+!:8+.;45+")
!!!!"#%&5"'!"<<!=6$>&.84)!>?#!#@.!A.%60@!9+,B#!C,.")
!!!!"&+5.4"'!D"0$-+,E")!">..6")!"3@$#$E6%3@2"F)
!!!!"#G+##.6"'!"%")
!!!!"@.+E@#"'!(HI
J%N%J%N%N

2. search
request

!"#$%,--./00$1!2$,13-/45660!17803.92:9#0O392#!,QRSF99#

>%"#$$5"!'!K)
%%?O3,2#<3?%/%>
%%%%?-1-2$?%/%;E the execution time
%%%%?3"!!9338"$?%/%;E
%%%%?82B$9<?%/%6
%%NE
%%?,B-3?%/%>
%%%%?-1-2$?%/%;E
response

%%%%?@2PO3!1#9?%/%6UV46LM64E
%%%%?,B-3?%/%G%>
%%%%%%?OB7<9P?%/%?!178?E
%%%%%%?O-I.9?%/%?3.92:9#?E
%%%%%%?OB<?%/%?;?E
%%%%%%?O3!1#9?%/%6UV46LM64E
%%%%%%?O31"#!9?%/%
>
%%%%?72@9?/%?A7<#9B%C@B9D3:B?E
%%%%?-2$:?/%?44%(#1F$9@3E%F"-%-,9%*92#!,%AB7=-%)79?E
%%%%?$B:93?/%G?!1<B7H?E%?F99#?E%?.,1-1H#2.,I?JE
%%%%?-KB--9#?/%?2?E
%%%%?,9BH,-?/%;LM
N%N%J%N%N

3. proﬁt

that’s up to you

distributed model

provides:

performance

resiliency (high-availability)

shards
a portion of the document space

each one is a separate Lucene index

thus, many per-index settings are available

document is sharded by its _id value

but can be assigned (routed) to a shard
deterministically

zero-conf discovery

zen (multicast and unicast)

cloud (EC2 via API)

auto-routing

master node:

maintains cluster state

reassigns shards if nodes leave/join cluster

any node can process the search request

the query is handled via scatter-gather mechanism

replicas

each shard can have 1 or more replicas

# of replicas can be updated dynamically after
index creation

replicas can be used for querying in parallel

shard allocation
node 1

start with a single node

shard allocation
node 1
person1
person2

PUT /person {
“index”: {
“number_of_shards”: 2,
“number_of_replicas”: 1
}}

shard allocation
node 1 node 2
person1 person1
person2 person2

start the second node

shard allocation
node 1 node 2 node 3 node 4
person1 person1
person2 person2

start 2 more nodes

document sharding
person1 person1
person2 person2

PUT /person/info/1
{…}

document sharding
person1 person1
person2 person2

PUT /person/info/1
hashed to shard 1 {…}

document sharding
person1 person1
person2 person2

replicated

PUT /person/info/1
{…}

document sharding
person1 person1
person2 person2

PUT /person/info/2
{…}

document sharding
person1 person1
person2 person2

hashed to shard 2
PUT /person/info/2
{…}

document sharding
person1 person1
person2 person2

replicated

PUT /person/info/2
{…}

scatter-gather
person1 person1
person2 person2

GET /person/_search?q=name:thomas

shard allocation
person1 person1
person2 person2

GET /person/_search?q=name:thomas

transactional model

per-document consistency

no need to commit/ﬂush

uses write-ahead transaction log

write consistency (W) can be controlled

one, quorum, or all

(near) real-time search

1 second refresh rate by default

_refresh API also

index storage

node data considered transient

can be stored in local ﬁle system, JVM heap,
native OS memory, or FS & memory combination

persistent storage requires a gateway

gateways
persistent store for cluster state and indices

asynchronous, translog-based write strategy

allows full recovery if a cluster restart is needed

supported gateways:
local
shared FS
Hadoop via HDFS
S3

mapping
describes document structure to the search
engine

automatically created with sensible defaults

explicit mapping can be provided (generally, a
good idea)

can run into merge conﬂicts

mapping

important meta ﬁelds:

_source

_all

there are more

mapping types

simple:

string, integer/long, ﬂoat/double, boolean, and
null)

complex:

array, object

sample mapping
document

>?"39#?/%%%%%%?<9#B!:?E
%?-B-$9?/%%%%%?W17X-%(27B!?E
%?-2H3?/%%%%%%G?.#18B$B7H?E%?<9F"HHB7H?E%?.,.?JE
%?.13-W2-9?/%%?56;6&;5&55+;M/;Y/;5?E
%?.#B1#B-I?/%%5N

>?.13-?/%>
%%?.#1.9#-B93?%/%>
mapping

%%%%?"39#?/%%%%%%>?-I.9?/%?3-#B7H?E%?B7<9P?/%?71-O272$IZ9<?NE
%%%%?@9332H9?/%%%>?-I.9?/%?3-#B7H?E%[F113-/%;UVNE
%%%%?-2H3?/%%%%%%>?-I.9?/%?3-#B7H?E%?B7!$"<9OB7O2$$?/%?71?NE
%%%%?.13-W2-9?%/%>?-I.9?%/%?<2-9?E%[3-1#9/%[71NE
%%%%?.#B1#B-I?%/%>?-I.9?%/%?B7-9H9#?N
NNN

analyzers
break down (tokenize) and normalize ﬁelds during
indexing and query strings at search time

analyzer = tokenizer + token ﬁlters (0 or more)
*-27<2#<%A72$IZ9#%S
%%%*-27<2#<%+1:97BZ9#%]
%%%%%%%*-27<2#<%+1:97%^B$-9#%]
%%%%%%%_1K9#!239%+1:97%^B$-9#%]
%%%%%%%*-1.%+1:97%^B$-9#

analyzers
analyzers, tokenizers, and ﬁlters can be
customized
mapping elasticsearch.yml

B7<9P/
%%272$I3B3/
%%%%272$IZ9#/
%%%%%%.?&%,E/
%%%%%%%%-I.9/%!"3-1@
%%%%%%%%-1:97BZ9#/%3-27<2#<
%%%%%%%%8B$-9#/%G3-27<2#<E%$1K9#!239E%3-1.E
%%%%%%%%%%%%%%%%%23!BB81$<B7HE%.1#-9#*-9@J

`
?-B-$9?/%>?-I.9?/%?3-#B7H?E%?272$IZ9#?/%?9"$27H?NE
`

API conventions

append ?pretty=true to get readable JSON

boolean values: false/0/off = false, rest is true

JSONP support via callback parameter

API structure

http://host:port/[index]/[type]/[_action/id]

GET http://es:9200/_status

GET http://es:9200/twitter/_status

POST http://es:9200/twitter/tweet/1

GET http://es:9200/twitter/tweet/1

API structure
http://host:port/[index]/[type]/[_action/id]

GET http://es:9200/twitter/tweet/_search

GET http://es:9200/twitter/user/_search

GET http://es:9200/twitter/tweet,user/_search

GET http://es:9200/twitter,facebook/_search

GET http://es:9200/_search

API query example
>
%%%%?R"9#I?/%>
%%%%%%%%?8B$-9#9<?/%>
%%%%%%%%%%%%?R"9#I?/%>
%%%%%%%%%%%%%%%%?R"9#IO3-#B7H?/%>
%%%%%%%%%%%%%%%%%%%%?R"9#I?/%?811%F2#?E
%%%%%%%%%%%%%%%%%%%%?<982"$-O1.9#2-1#?/%?AaW?E
%%%%%%%%%%%%%%%%%%%%?8B9$<3?/%G?-B-$9?E%?<93!#B.-B17?JE
%%%%%%%%%%%%%%%%%%%%?F113-?/%5U6
%%%%%%%%%%%%%%%%N
%%%%%%%%%%%%NE
%%%%%%%%%%%%?8B$-9#?/%>
%%%%%%%%%%%%%%%%?#27H9?/%>?<2-9?/%>?H-?/%?56;;&6T&64?NN
%%%%%%%%%%%%N
%%%%%%%%N
%%%%NE
%%%%?8#1@/%;6E
%%%%?3BZ9?/%;6
N

API {core}
index search

bulk query

delete from/size paging

delete by query sort

get highlighting

count selective ﬁelds

API {indices}
create optimize

delete snapshot

open/close update settings

get/put/delete analyze
mapping
status
refresh
ﬂush

Query DSL
term / terms query_string

range default_operator

preﬁx analyzer

bool phrase_slop

fuzzy etc

wildcard

ﬁlters

share some similar features with queries (term,
range, etc)

why use a ﬁlter?

filters
faster than queries

cached (depends on the filter)

the cache is used for different queries against
the same filter

no scoring

more useful ones: term, terms, range, prefix, and,
or, not, exists, missing, query

facets

provide aggregated data based on the search
request

terms, histogram, date histogram, range,
statistical, and more

geo search

implemented as ﬁlters (and a facet)
geo_distance

geo_bounding_box

geo_polygon

interfaces

REST

Java /!Groovy

Language clients (REST/Thrift):

pyes, PHP (standalone and symfony), Ruby, Perl

Flume sink implementation

data import

ES is not the primary data store (usually)

to import/synchronize data:

write an agent (Gearman, message queues, etc)

use rivers (CouchDB, RabbitMQ, Twitter)

10 more features
versioning load balancing nodes

index aliases plugins

parent/child docs more_like_this

scripting multi_ﬁeld mapping

dynamic mapping percolation
templates

References
http://github.com/elasticsearch/elasticsearch

http://groups.google.com/a/elasticsearch.com/
group/users/

IRC: #elasticsearch on irc.freenode.net

twitter: @elasticsearch

99 problems but the search aint one, confoo 2011, andrei zmievski

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a 99 problems but the search aint one, confoo 2011, andrei zmievski

Semelhante a 99 problems but the search aint one, confoo 2011, andrei zmievski (20)

Mais de Bachkoutou Toutou

Mais de Bachkoutou Toutou (14)

99 problems but the search aint one, confoo 2011, andrei zmievski