SlideShare a Scribd company logo
1 of 37
Download to read offline
Elasticsearch	Internal	-	Shards
Jiaming	(Jason)	Zhang
31	May	2017
Recap
Recap
Launch	Elasticsearch	(ES)	and	Kibana	(a	visualization	tool	for	ES)	locally
brew	install	elasticsearch
#	Automatically	start	Elasticsearch	when	log	in
#	By	default,	Elasticsearch	will	be	available	at	http://localhost:9200
brew	services	start	elasticsearch
###################################
brew	install	kibana
#	Automatically	start	Kibana	when	log	in
#	By	default,	Kibana	will	be	available	at	http://localhost:5601
brew	services	start	kibana
###################################
#	Verify	whether	both	service	is	started
brew	services	list	|	grep	-E	"elasticsearch|kibana"
Elasticsearch	Internal	-	Shards
Elasticsearch	Internal	-	Shards
A	Elasticsearch	cluster	has	2	indices.	Here	is	the	setting	for	one	of	them:
#	Index:	entity_companies
{
				"number_of_primary_shards":	3,
				"number_of_replicas":	2
}
How	many	shards	does	the	cluster	has?
Elasticsearch	Internal	-	Shards
A	Elasticsearch	cluster	has	2	indices.	Here	is	the	setting	for	one	of	them:
#	Index:	entity_companies
{
				"number_of_primary_shards":	3,
				"number_of_replicas":	2
}
How	many	shards	does	the	cluster	has?	It	also	depends	on	the	other	index.
Elasticsearch	Internal	-	Shards
A	Elasticsearch	cluster	has	2	indices.	Here	is	the	setting	for	one	of	them:
#	Index:	entity_companies
{
				"number_of_primary_shards":	3,
				"number_of_replicas":	2
}
How	many	shards	does	the	cluster	has?	It	also	depends	on	the	other	index.
How	many	shards	does	this	index	has?
Elasticsearch	Internal	-	Shards
A	Elasticsearch	cluster	has	2	indices.	Here	is	the	setting	for	one	of	them:
#	Index:	entity_companies
{
				"number_of_primary_shards":	3,
				"number_of_replicas":	2
}
How	many	shards	does	the	cluster	has?	It	also	depends	on	the	other	index.
How	many	shards	does	this	index	has?	It	also	depends	on	#	of	nodes	the	cluster	has.
Elasticsearch	Internal	-	Shards
A	Elasticsearch	cluster	has	2	indices.	Here	is	the	setting	for	one	of	them:
#	Index:	entity_companies
{
				"number_of_primary_shards":	3,
				"number_of_replicas":	2
}
How	many	shards	does	the	cluster	has?	It	also	depends	on	the	other	index.
How	many	shards	does	this	index	has?	It	also	depends	on	#	of	nodes	the	cluster	has.
How	many	shards	does	this	index	expects	to	has?
Elasticsearch	Internal	-	Shards
A	Elasticsearch	cluster	has	2	indices.	Here	is	the	setting	for	one	of	them:
#	Index:	entity_companies
{
				"number_of_primary_shards":	3,
				"number_of_replicas":	2
}
How	many	shards	does	the	cluster	has?	It	also	depends	on	the	other	index.
How	many	shards	does	this	index	has?	It	also	depends	on	#	of	nodes	the	cluster	has.
How	many	shards	does	this	index	expects	to	has?	9	shards	=	3	primary	shards	*	(1	origin	+	2	replicas)
Elasticsearch	Internal	-	Shards
Each	shard	is	in	itself	a	fully-functional	and	independent	"index"	that	can	be	hosted	on	any	node	in	the
cluster.
Elasticsearch	Internal	-	Shards
Each	shard	is	in	itself	a	fully-functional	and	independent	"index"	that	can	be	hosted	on	any	node	in	the
cluster.
Thus,	#	of	primary	shards	effectively	determines	how	many	data	an	index	can	hold.
Elasticsearch	Reference	-	Basic	Concepts
Elasticsearch	Internal	-	Shards
What	factors	determine	how	many	data	an	index	can	store?
Elasticsearch	Internal	-	Shards
What	factors	determine	how	many	data	an	index	can	store?
Here	are	the	factors	that	determine	how	many	data	an	index	can	store:
#	of	primary	shards
Max	#	of	documents	a	shard	can	have	is	bounded	by	max	#	of	documents	a	Lucene	index	can	have
(around	2	billions).	[1]
Hardware	(e.g.	#	of	nodes,	disk	space,	CPU)
Document	size
Use	case	(e.g.	query,	expected	response	time)
[1]	Elasticsearch	Reference	-	Basic	Concept
Elasticsearch	Internal	-	Shards
Create	index	entity_companies	w/	default	settings
PUT	entity_companies
Elasticsearch	Internal	-	Shards
Create	index	entity_companies	w/	default	settings
PUT	entity_companies
Get	index	entity_companies	metadata
GET	_cat/indices/entity_companies?v&h=health,index,pri,rep
health	index												pri	rep
yellow	entity_companies			5			1
Elasticsearch	Internal	-	Shards
Create	index	entity_companies	w/	default	settings
PUT	entity_companies
Get	index	entity_companies	metadata
GET	_cat/indices/entity_companies?v&h=health,index,pri,rep
health	index												pri	rep
yellow	entity_companies			5			1
By	default,	an	index	has	5	primary	shards	and	1	replica.
Elasticsearch	Internal	-	Shards
Create	index	entity_people	w/	specified	settings
PUT	entity_people
{
				"settings"	:	{
								"index"	:	{
												"number_of_shards"	:	3,
												"number_of_replicas"	:	1
								}
				}
}
Elasticsearch	Internal	-	Shards
Create	index	entity_people	w/	specified	settings
PUT	entity_people
{
				"settings"	:	{
								"index"	:	{
												"number_of_shards"	:	3,
												"number_of_replicas"	:	1
								}
				}
}
Update	number_of_replicas	after	index	entity_people	is	created
PUT	entity_people/_settings
{
				"index"	:	{	"number_of_replicas"	:	2	}
}
Elasticsearch	Internal	-	Shards
Update	number_of_shards	after	index	entity_people	is	created	(THIS	WILL	FAIL)
PUT	entity_people/_settings
{
				"index"	:	{	"number_of_shards"	:	2	}
}
Response:	400
Reason:	Can't	update	non	dynamic	settings	[[index.number_of_shards]]
				for	open	indices	[[entity_people/JGW2oY98RZeZJM8mFN5h_w]]
Elasticsearch	Internal	-	Shards
Why	doesn't	Elasticsearch	support	changing	#	of	primary	shards	on-the-fly?
Elasticsearch	Internal	-	Shards
Why	doesn't	Elasticsearch	support	changing	#	of	primary	shards	on-the-fly?
The	answer	is	related	to	how	Elasticsearch	determines	which	shard	a	document	should	be	saved	in.
Elasticsearch	Internal	-	Shards
Why	doesn't	Elasticsearch	support	changing	#	of	primary	shards	on-the-fly?
The	answer	is	related	to	how	Elasticsearch	determines	which	shard	a	document	should	be	saved	in.
...	if	the	number	of	primary	shards	ever	changed	in	the	future,	all	previous	routing	values	would	be
invalid	and	documents	would	never	be	found.
Elasticsearch	Guide	-	Routing	Value
Elasticsearch	Internal	-	Shards
What	should	I	do	if	I	really	need	to	increase	#	of	primary	shards?
Elasticsearch	Internal	-	Shards
What	should	I	do	if	I	really	need	to	increase	#	of	primary	shards?	-	Reindexing	all	documents
Elasticsearch	Internal	-	Shards
What	should	I	do	if	I	really	need	to	increase	#	of	primary	shards?	-	Reindexing	all	documents
Reindexing	is	needed	when	we	want	to	update	certain	immutable	settings	of	the	index,	like	increasing	#	of
primary	shards	or	changing	existed	field	mapping.
Elasticsearch	Internal	-	Shards
What	should	I	do	if	I	really	need	to	increase	#	of	primary	shards?	-	Reindexing	all	documents
Reindexing	is	needed	when	we	want	to	update	certain	immutable	settings	of	the	index,	like	increasing	#	of
primary	shards	or	changing	existed	field	mapping.
Reindexing	simply	means	to	(1)	cycle	through	all	documents	in	the	existed	index	and	(2)	re-insert	them	to	the
new	index	with	the	desired	new	settings.	Reindex	API	was	introduced	to	Elasticsearch	in	2016.
Elasticsearch	Internal	-	Shards
What	should	I	do	if	I	really	need	to	increase	#	of	primary	shards?	-	Reindexing	all	documents
Reindexing	is	needed	when	we	want	to	update	certain	immutable	settings	of	the	index,	like	increasing	#	of
primary	shards	or	changing	existed	field	mapping.
Reindexing	simply	means	to	(1)	cycle	through	all	documents	in	the	existed	index	and	(2)	re-insert	them	to	the
new	index	with	the	desired	new	settings.	Reindex	API	was	introduced	to	Elasticsearch	in	2016.
POST	_reindex
{	"source":	{	"index":	"entity_companies"	},
		"dest":	{	"index":	"entity_companies_v2"	}}
Elasticsearch	Internal	-	Shards
What	should	I	do	if	I	really	need	to	increase	#	of	primary	shards?	-	Reindexing	all	documents
Reindexing	is	needed	when	we	want	to	update	certain	immutable	settings	of	the	index,	like	increasing	#	of
primary	shards	or	changing	existed	field	mapping.
Reindexing	simply	means	to	(1)	cycle	through	all	documents	in	the	existed	index	and	(2)	re-insert	them	to	the
new	index	with	the	desired	new	settings.	Reindex	API	was	introduced	to	Elasticsearch	in	2016.
POST	_reindex
{	"source":	{	"index":	"entity_companies"	},
		"dest":	{	"index":	"entity_companies_v2"	}}
Reindex	API	also	supports
1.	 Selective	Reindex	Operation
2.	 Using	Scripts	with	the	Reindexing	API
3.	 Reindexing	for	Mapping	Changes
Elasticsearch	Internal	-	Shards
What	should	I	do	if	I	really	need	to	increase	#	of	primary	shards?	-	Reindexing	all	documents
Reindexing	is	needed	when	we	want	to	update	certain	immutable	settings	of	the	index,	like	increasing	#	of
primary	shards	or	changing	existed	field	mapping.
Reindexing	simply	means	to	(1)	cycle	through	all	documents	in	the	existed	index	and	(2)	re-insert	them	to	the
new	index	with	the	desired	new	settings.	Reindex	API	was	introduced	to	Elasticsearch	in	2016.
POST	_reindex
{	"source":	{	"index":	"entity_companies"	},
		"dest":	{	"index":	"entity_companies_v2"	}}
Reindex	API	also	supports
1.	 Selective	Reindex	Operation
2.	 Using	Scripts	with	the	Reindexing	API
3.	 Reindexing	for	Mapping	Changes
Reindex	Your	Documents	with	Ease	and	without	Third-Party	Scripts	using	Elasticsearch
Elasticsearch	Internal	-	Shards
Use	index	alias	if	you	need	to	regularly	reindex	your	documents
Elasticsearch	Internal	-	Shards
Let's	put	it	together
Elasticsearch	Internal	-	Shards
Let's	put	it	together
Step	1:	Create	index	w/	new	settings
PUT	entity_companies_v2
{
				"settings"	:	{
								"index"	:	{
												"number_of_shards"	:	3,
												"number_of_replicas"	:	1
								}
				}
}
Elasticsearch	Internal	-	Shards
Let's	put	it	together
Step	1:	Create	index	w/	new	settings
PUT	entity_companies_v2
{
				"settings"	:	{
								"index"	:	{
												"number_of_shards"	:	3,
												"number_of_replicas"	:	1
								}
				}
}
Step	2:	Connect	alias	to	the	new	index
POST	/_aliases
{
				"actions"	:	[
								{	"add"	:	{	"index"	:	"entity_companies_v2",	"alias"	:	"entity_companies"	}	}
				]
}
Elasticsearch	Internal	-	Shards
Let's	put	it	together
Step	3:	Reindex	all	document	from	old	index	to	the	new	one
POST	_reindex
{
				"source":	{	"index":	"entity_companies"	},
				"dest":	{	"index":	"entity_companies_v2"	}
}
Elasticsearch	Internal	-	Shards
Let's	put	it	together
Step	3:	Reindex	all	document	from	old	index	to	the	new	one
POST	_reindex
{
				"source":	{	"index":	"entity_companies"	},
				"dest":	{	"index":	"entity_companies_v2"	}
}
Step	4:	Remove	alias's	connection	to	the	old	index
POST	/_aliases
{
				"actions"	:	[
								{	"remove"	:	{	"index"	:	"entity_companies_v2",	"alias"	:	"entity_companies"	}	}
				]
}
Elasticsearch	Reference	-	Indices	Aliases
Reading
If	you	find	this	topic	interesting,	here	are	a	few	extra	readings:
Optimizing	Elasticsearch:	How	Many	Shards	per	Index?
Elasticsearch	-	The	Definitive	Guide	(by	ClintonGormley	&	Zachary	Tong)
Chapter	2	Life	Inside	a	Cluster:	This	chapter	explains	what	Elasticsearch's	internal	looks	like.	Shard
is	the	most	important	concept	to	understand	Elasticsearch's	internal.	It	also	explains	what	happen
when	we	add	more	nodes	to	the	cluster	or	a	cluster	node	fails.
Chapter	4	Distributed	Document	Store:	This	chapter	explains	how	shards	communicates	w/	each
other	when	a	create/update/delete/query	request	is	made	as	well	as	other	topic	related	to
Elasticsearch's	distributed	nature

More Related Content

Similar to Elasticsearch Internal - Shards

Amazon Web Services User Group Sydney - February 2018
Amazon Web Services User Group Sydney - February 2018Amazon Web Services User Group Sydney - February 2018
Amazon Web Services User Group Sydney - February 2018
PolarSeven Pty Ltd
 
Devopsdays State of the Union Amsterdam 2014
Devopsdays State of the Union Amsterdam 2014 Devopsdays State of the Union Amsterdam 2014
Devopsdays State of the Union Amsterdam 2014
John Willis
 

Similar to Elasticsearch Internal - Shards (20)

Mulesoft ELK
Mulesoft ELKMulesoft ELK
Mulesoft ELK
 
GPSTEC304_Shipping With PorpoiseA K8s Story
GPSTEC304_Shipping With PorpoiseA K8s StoryGPSTEC304_Shipping With PorpoiseA K8s Story
GPSTEC304_Shipping With PorpoiseA K8s Story
 
Amazon Web Services User Group Sydney - February 2018
Amazon Web Services User Group Sydney - February 2018Amazon Web Services User Group Sydney - February 2018
Amazon Web Services User Group Sydney - February 2018
 
Running Container on AWS - Builders Day Israel
Running Container on AWS - Builders Day IsraelRunning Container on AWS - Builders Day Israel
Running Container on AWS - Builders Day Israel
 
Deep Dive into AWS Fargate - CON333 - re:Invent 2017
Deep Dive into AWS Fargate - CON333 - re:Invent 2017Deep Dive into AWS Fargate - CON333 - re:Invent 2017
Deep Dive into AWS Fargate - CON333 - re:Invent 2017
 
Introducing Amazon EKS
Introducing Amazon EKSIntroducing Amazon EKS
Introducing Amazon EKS
 
6 Things You Need to Know to Safely Run Kubernetes
6 Things You Need to Know to Safely Run Kubernetes6 Things You Need to Know to Safely Run Kubernetes
6 Things You Need to Know to Safely Run Kubernetes
 
Deep Dive into AWS Fargate
Deep Dive into AWS FargateDeep Dive into AWS Fargate
Deep Dive into AWS Fargate
 
SRECon 18 Immutable Infrastructure
SRECon 18 Immutable InfrastructureSRECon 18 Immutable Infrastructure
SRECon 18 Immutable Infrastructure
 
#SREcon Immutable Infrastructure: rethinking configuration mgmt
#SREcon Immutable Infrastructure: rethinking configuration mgmt#SREcon Immutable Infrastructure: rethinking configuration mgmt
#SREcon Immutable Infrastructure: rethinking configuration mgmt
 
Introducing Amazon Fargate
Introducing Amazon FargateIntroducing Amazon Fargate
Introducing Amazon Fargate
 
Kubernetes on AWS with Amazon EKS
Kubernetes on AWS with Amazon EKSKubernetes on AWS with Amazon EKS
Kubernetes on AWS with Amazon EKS
 
AWS Services Eagle View Dec-2017
AWS Services Eagle View Dec-2017AWS Services Eagle View Dec-2017
AWS Services Eagle View Dec-2017
 
2017 03-29-elastic-meetup-kibana
2017 03-29-elastic-meetup-kibana2017 03-29-elastic-meetup-kibana
2017 03-29-elastic-meetup-kibana
 
Devopsdays State of the Union Amsterdam 2014
Devopsdays State of the Union Amsterdam 2014 Devopsdays State of the Union Amsterdam 2014
Devopsdays State of the Union Amsterdam 2014
 
Communication Amongst Microservices: Kubernetes, Istio, and Spring Cloud with...
Communication Amongst Microservices: Kubernetes, Istio, and Spring Cloud with...Communication Amongst Microservices: Kubernetes, Istio, and Spring Cloud with...
Communication Amongst Microservices: Kubernetes, Istio, and Spring Cloud with...
 
Mobile Analytics mit Elasticsearch und Kibana
Mobile Analytics mit Elasticsearch und KibanaMobile Analytics mit Elasticsearch und Kibana
Mobile Analytics mit Elasticsearch und Kibana
 
Introduction to AWS Fargate & Amazon Elastic Container Service for Kubernetes
Introduction to AWS Fargate & Amazon Elastic Container Service for KubernetesIntroduction to AWS Fargate & Amazon Elastic Container Service for Kubernetes
Introduction to AWS Fargate & Amazon Elastic Container Service for Kubernetes
 
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
 
Is your kubernetes negative or positive
Is your kubernetes negative or positive Is your kubernetes negative or positive
Is your kubernetes negative or positive
 

More from Jiaming Zhang (6)

Option - A Better Way to Handle Null Value
Option - A Better Way to Handle Null ValueOption - A Better Way to Handle Null Value
Option - A Better Way to Handle Null Value
 
Loop Like a Functional Programing Native
Loop Like a Functional Programing NativeLoop Like a Functional Programing Native
Loop Like a Functional Programing Native
 
Functional Programing Principles
Functional Programing PrinciplesFunctional Programing Principles
Functional Programing Principles
 
Job Posts Comparison (Linkedin vs Indeed)
Job Posts Comparison (Linkedin vs Indeed) Job Posts Comparison (Linkedin vs Indeed)
Job Posts Comparison (Linkedin vs Indeed)
 
Understand the Demand of Analyst Opportunity in U.S
Understand the Demand of Analyst Opportunity in U.SUnderstand the Demand of Analyst Opportunity in U.S
Understand the Demand of Analyst Opportunity in U.S
 
Course Project: Collaboration Proposal Between Port Authority and Tiramisu Team
Course Project: Collaboration Proposal Between Port Authority and Tiramisu TeamCourse Project: Collaboration Proposal Between Port Authority and Tiramisu Team
Course Project: Collaboration Proposal Between Port Authority and Tiramisu Team
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Elasticsearch Internal - Shards