Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker

Elasticsearch & Docker
Rafał Kuć – Sematext Group, Inc.
@kucrafal @sematext sematext.com
Running High Performance
Fault Tolerant
Elasticsearch Clusters On Docker

About me…
Sematext consultant & engineer
Solr.pl co-founder
Father and husband :)

You Are Probably Familiar With This
Development

Development Test

Development Test QA

Development Test QA
Production environment

And The Problems That Come With It
Resources not utilized

Overprovisioned
Servers

Overprovisioned
Servers
≠ ≠

The solution
Development Test QA Production

What is Docker?
Lightweight
Based on
Open Standards
Secure

Containers vs Virtual Machines
Hardware
Traditional Virtual Machine

Hardware
Host Operating System

Hardware
Hypervisor

Hardware
Hypervisor
Guest OS Guest OS

Hardware
Hypervisor
Guest OS Guest OS
Libraries Libraries

Hardware
Hypervisor
Guest OS Guest OS
Libraries Libraries
Application 1 Application 2

Hardware
Hypervisor
Guest OS Guest OS
Libraries Libraries
Hardware
Traditional Virtual MachineContainer

Hardware
Hypervisor
Guest OS Guest OS
Libraries Libraries
Hardware
Docker Engine

Hardware
Hypervisor
Guest OS Guest OS
Libraries Libraries
Hardware
Docker Engine
Libraries Libraries

What is Elasticsearch?
Reasonable
defaults { JSON }
Distributed
by design
http://www.dailypets.co.uk/2007/06/17/kittens-rest-at-half-time/

Running Official Elasticsearch Container
$ docker run -d elasticsearch

$ docker run -d elasticsearch == docker run -d elasticsearch:latest

$ docker run -d elasticsearch:1.7

$ docker run --name es_1 -h es_master_1 elasticsearch
$ docker run -d elasticsearch:1.7

Container Constraints
$ docker run -d -m 2G elasticsearch
http://docs.docker.com/engine/reference/run/

$ docker run -d -m 2G --memory-swappiness=0 elasticsearch

$ docker run -d --cpuset-cpus="1,3" elasticsearch

$ docker run -d --cpuset-cpus="1,3" elasticsearch
$ docker run -d --cpu-period=50000 --cpu-quota=25000
elasticsearch

Creating Optimized Image
Dockerfile:
FROM elasticsearch
ADD ./elasticsearch.yml /usr/share/elasticsearch/config/

Dockerfile:
FROM elasticsearch
$ docker build -t devops/example .

Dockerfile:
FROM elasticsearch
$ docker build -t devops/example .
Sending build context to Docker daemon 3.072 kB
Step 1 : FROM elasticsearch
---> 8112755253f1
Step 2 : ADD ./elasticsearch.yml /usr/share/elasticsearch/config/
---> Using cache ---> c9ca48a22e58
Successfully built c9ca48a22e58

Dealing With Network
$ docker run -d -p 9200:9200 -p 9300:9300 elasticsearch

-Dnetwork.publish_host=192.168.1.1

Network - Good Practices
Separate network for Elasticsearch cluster

Common host names for containers
$ docker run -d -h es_node_1 elasticsearch

Expose 9200 & 9300 ports only for client nodes

Expose 9200 & 9300 ports only for client nodes
Elasticsearch data & client nodes point to masters only

Dealing With Storage
By default in /usr/share/elasticsearch/data

By default not persisted

$ docker run -d
-v /opt/elasticsearch/data:/usr/share/elasticsearch/data
elasticsearch

$ docker run -d
-v /opt/elasticsearch/data:/usr/share/elasticsearch/data
elasticsearch
Use data only containers
Permissions

Data-Only Docker Volumes
Bypasses Union File System

Can be shared between containers

Data volumes persist if the container itself is deleted

$ docker create -v /mnt/es/data:/usr/share/elasticsearch/data
--name esdata elasticsearch
Permissions

$ docker create -v /mnt/es/data:/usr/share/elasticsearch/data
--name esdata elasticsearch
$ docker run --volumes-from esdata elasticsearch

Highly Available Cluster
Master only
Master only
Master only
Data only
Data only
Data only
Data only
Data only
Data only
Client only
Client only

Master only
Master only
Master only
Data only
Data only
Data only
Data only
Data only
Data only
Client only
Client only
minimum_master_nodes = N/2 + 1

Master only
Master only
Master only
Data only
Data only
Data only
Data only
Data only
Data only
Client only
Client only
minimum_master_nodes = N/2 + 1
recovery.after.nodes
recovery.expected.nodes
cluster.routing.allocation.node_concurrent_
recoveries
index.unassigned.node_left.delayed_timeout
index.priority

Master Nodes & Docker
-Dnode.master=true
-Dnode.data=false
-Dnode.client=false

Client Nodes & Docker
-Dnode.master=false
-Dnode.data=false
-Dnode.client=true

Data Nodes & Docker
-Dnode.master=false
-Dnode.data=true
-Dnode.client=false

Scaling
Elasticsearch Node Elasticsearch Node
Elasticsearch Node Elasticsearch Node

Scaling
curl -XPUT 'http://localhost:9200/devops/' -d '{
"settings" : {
"index" : {
"number_of_shards" : 4,
"number_of_replicas" : 0
}
}
}'

Scaling
curl -XPUT 'http://localhost:9200/devops/_settings' -d '{
"index.number_of_replicas" : 1
}'

Scaling
curl -XPUT 'http://localhost:9200/devops/_settings' -d '{
"index.number_of_replicas" : 2
}'

Scaling
P P
P P
R R
R R
R R
R R

Scaling
P P PP
UnassignedR
R
R
R

RAM Buffer
indices.memory.index_buffer_size: 10%
indices.memory.min_index_buffer_size: 48mb
indices.memory.max_index_buffer_size (unbounded)
indices.memory.min_shard_index_buffer_size: 4mb

RAM Buffer
indices.memory.index_buffer_size: 10%
indices.memory.min_index_buffer_size: 48mb
indices.memory.max_index_buffer_size (unbounded)
indices.memory.min_shard_index_buffer_size: 4mb
Higher Indexing
Throughput
Lower Indexing
Throughput
defaults ><

Time-Based Data?
2015-11-23
TODAY
WEEK

Time-Based Data?
curl -XPOST 'http://localhost:9200/_aliases' -d '{
"actions" : [
{ "add" : {"index":"2015-11-23","alias":"today"} },
{ "add" : {"index":"2015-11-23","alias":"week"} }
]}'

Time-Based Data?
2015-11-23 2015-11-24
TODAY
WEEK

Time-Based Data?
2015-11-23 2015-11-24 2014-11-25
TODAY
WEEK

Multiple Tiers
node.tag=hot node.tag=cold node.tag=cold

Multiple Tiers
curl -XPUT 'localhost:9200/data_2015-11-23' -d '{
"settings": {
"index.routing.allocation.include.tag" : "hot"
}
}'

Multiple Tiers
data_2015-11-23
data_2015-11-23

Multiple Tiers
curl -XPUT 'localhost:9200/data_2015-11-23/_settings' -d '{
"settings": {
"index.routing.allocation.exclude.tag" : "hot",
"index.routing.allocation.include.tag" : "cold",
}
}'

Multiple Tiers
data_2015-11-23
data_2015-11-23
data_2015-11-24
data_2015-11-24

Multiple Tiers
data_2015-11-23
data_2015-11-23
data_2015-11-25
data_2015-11-25
data_2015-11-24
data_2015-11-24

Multiple Tenants
Hot
Hot
Cold
Cold
Cold
Cold

Multiple Tenants
Hot
Hot
Cold
Cold
Cold
Cold
R
O
U
T
I
N
G

Indexing Without Routing
Shard 1 Shard 2 Shard 3 Shard 4
Elasticsearch
Application
userA
userA
userA
userA
userAuserA
userA
userA

Indexing With Routing
Elasticsearch
Application

Querying Without Routing
Elasticsearch
Application

Querying With Routing
Elasticsearch
Application

Routing vs No Routing
Queries without routing (200 shards, 1 replica)
#threads Avg response time Throughput 90% line Median CPU Utilization
1 3169ms 19,0/min 5214ms 2692ms 95 – 99%

Routing vs No Routing
Queries without routing (200 shards, 1 replica)
1 3169ms 19,0/min 5214ms 2692ms 95 – 99%
Queries with routing (200 shards, 1 replica)
10 196ms 50,6/sec 642ms 29ms 25 – 40%
20 218ms 91,2/sec 718ms 11ms 10 – 15%

Monitoring
https://sematext.com/spm/integrations/docker-monitoring.html
https://github.com/sematext/spm-agent-docker

Short summary
http://www.soothetube.com/2013/12/29/thats-all-folks/

We Are Hiring!
Dig Search?
Dig Analytics?
Dig Big Data?
Dig Performance?
Dig Logging?
Dig working with, and in, open–source?
We’re hiring worldwide!
http://sematext.com/about/jobs.html

Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Sematext
@sematext
http://sematext.com
http://blog.sematext.com
Thank You !

Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker

Semelhante a Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker (20)

Mais de Sematext Group, Inc.

Mais de Sematext Group, Inc. (20)

Último

Último (20)

Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker

Notas do Editor