2. "ELK" is the formed forthree open source projects: Elasticsearch,
Logstash, and Kibana. Elasticsearch is a search and analytics engine.
Logstash is a server side data processing pipeline that ingests data from‑
multiple sources simultaneously, transforms it, and then sends it to a
"stash" like Elasticsearch. Kibana lets users visualize data with charts
and graphs in Elasticsearch.
Beats is installed in source side working as data shipper, basically send
data from source to logstash orelasticsearch.
The Elastic Stack is the next evolution of the ELKStack
3. Deep DriveOn Elasticsearch
• Compare terms with Elasticsearch and Relation Database
System
• Elasticsearch Terms (Index, Type, Documents, ID)
• Elasticsearch Cluster, Nodes, Shards and Replicas Concepts
• Elasticsearch Scalability and Availability
• Elasticsearch Installation
• Elasticsearch GETAPI, PUTAPI, POSTAPI, DELETEAPI and
BULKupload.
• Elasticsearch Repository snapshot backup and restoration
• Elasticsearch Head Plugin forGraphical Visualization
• Why Elasticsearch is very fast ?
• Strengths and limitations of Elasticsearch.
• Best two successful test-case for Elasticsearch.
4. Compare terms with Elasticsearch and Relation
Database System
MySQL=> Databases => Tables => Columns/Rows
Elasticsearch => Indexes=> Types => Documents with Properties
5. Index:
•Collection of document having similarcharacteristics.
•Equivalent to a database instance in a Relation Database.
•Mapping which defines multiple types.
•Logical namespace to map one ormore primary shards
•Can have Zero ormore replicas
Types:
•Equivalent to table in a relation database
•Each type has a list of fields
•Mapping defines how each field is analyzed
Elasticsearch Terms (Index, Type, Documents, ID)
6. Example of Index and Types:
•In carmanufacturing scenario, has a Factory index. Within this
index, you have three different types:
•People
•Cars
•Spare Parts
Document:
•JSON document stored in Elasticsearch
•Equivalent to a row in a relational database
ID:
•Unique identifierto identify a document
•Combination of index, type and id must be unique to be able to
identify a document deterministically.
•The usercan specify the id but id can also be auto generated by
Elasticsearch if it is not provided.
7. Elasticsearch Cluster, Nodes, Shards and Replicas Concepts
Cluster:
•Collection of nodes
•Identified by a unique
Nodes
•Each of the serverof a clusteris called a Node. In elasticsearch nodes
are divided into three ways according to theirfunctionality.
MasterNode
•Light weighted node
•Responsible forclustermanagement
•Ensures the clusterstable
•It is not recommended to send index orsearch request to this node
Data Node
•Responsible forstoring actual data
•Participates in indexing process
8. Client Node
•Acts as a Load balancerforprocessing requests
•Used to perform scatter/gatherbased operation like search
•Neitherstores the data norparticipates in clustermanagement
•Relieves data node to do heavy duty of searching
Shards
•Logical unit to store data
•Each document is stored in single primary shard
•By default each Index has five shards
Replicas
•Each Index can have 0 ormore replicas
•Helps with fail over, performance
•Replicas are neverstored on the same node as primary shard
node
•Can be changed afterindex creation.
9. Elasticsearch Scalability and Availability
* Elasticsearch Scalability
Partitioning data across multiple machines allows Elasticsearch to
scale beyond what a single machine do and support high throughput
operations. Data is divided into small parts called shards. As the
index is distributed across multiple shards, a query against an index
is executed in parallel across all the shards. The results from each
shard are then gathered and sent back to the client. Executing the
query in parallel greatly improves the search performance.
10. Elasticsearch Scalability
In Previous Page Diagram index is distributed into 3 nodes ,
that's why Node 1 and node 2 has consists 2 index each and
node 3 consists one index. Afterfew days lateruserfound
that 3 nodes are not enough usercan add two more nodes
then total node will be 5 and 5 shards are distributed in each
node.
Note: Once userdefines numberof shards can not change it
aftercreation of index.
11. Elasticsearch Availability
• Hereisthreedatanodewemention number of shardsof thisindex is
3 so three3 shardsaredivided into threenodesoneshard each. On
theother hand mention that number of replicasis1 that'senough for
configuration, elasticsearch distributed of onenodeshardsreplicaon
another node.
12. Elasticsearch Availability
• Afterthat forany reason one of the data node 4 is down now situation is
below
But system is accessible on that moments cause node 4 shard 2 has replica
has R2 which is node 3 becomes primary node and node 4 R3 has primary
data in node 5 that is shard 3.
• Note: Configuration of maintain is availability is so easy just
add index.number_of_replicas: 1 in elasticsearch config
13. Elasticsearch Installation
It is two part one is System Configuration anotheris
Elasticsearch config file configuration.
System Configuration:
edit /etc/sysctl.conf
fs.file-max = 70000
vm.max_map_count=300000
/sbin/sysctl -p
Add the following lines to the "/etc/security/limits.conf" file
oracle soft nproc 16384
oracle hard nproc 16384
oracle soft nofile 4096
oracle hard nofile 65536
oracle soft stack 10240
AfterChanging the above configuration , you must reboot the machine
before start the elasticsearch server
14. Elasticsearch Installation
• X-Pack plugin Installation
• From enterprise-grade security and developer-friendly APIs to machine learning, and
graph analytics, the Elastic Stack ships with features (formerly packaged as X-Pack).
• Note: Make sure Elasticsearch version and X-pack version should be same, otherwise
getting error
• X-pack plugin Installing Step
• [12:48:22 oracle@test2 bin]$ ./elasticsearch-plugin install
file:/home/oracle/ELK/x_Pack/x-pack-6.2.2.zip
• Plugin list check command
• ./elasticsearch-plugin list
15. Elasticsearch Installation
• Config File Configuration:
• Every product of ELK(Elasticsearch, Logstash and Kibana all configuration depends on
config file.
• cluster.name: my-application Use a descriptive name foryourcluster:→
index.number_of_shards : 5 Default value is 5 means all the type are→
index.number_of_replicas: 0
node.name: node-2 Use a descriptive name forthe node:→
path.data: /home/oracle/ELK/data Path to directory where to store the data→
(separate multiple locations by comma):
path.logs: /home/oracle/ELK/log Path to log files:→
path.repo: /home/oracle/ELK/backup Path forbackup snapshot→
bootstrap.memory_lock: true Lock the memory on startup:→
network.host: 192.168.56.103 Set the bind address to a specific IP(IPv4 or→
IPv6):
discovery.zen.ping.unicast.hosts: ["192.168.56.103", "192.168.56.101"] →
Nodes IPaddress
discovery.zen.minimum_master_nodes: 1
transport.host: localhost Should be localhost or127.0.0.1→
transport.tcp.port: 9300
http.port: 9200
node.master: true
node.data : true
xpack.graph.enabled: true
xpack.logstash.enabled: true
xpack.ml.enabled: true
xpack.monitoring.enabled: true
xpack.watcher.enabled: true
16. Elasticsearch GET API, PUT API, POST API , DELETE API and bulk
upload
• Example of GETAPI
• [16:41:12 oracle@ansible elasticsearch-6.2.2]$ curl -XGET'
http://192.168.56.103:9200/car/car/AXkFe2kBPzPPA2iKNVqD?pretty‘
•
{
"_index" : "car",
"_type" : "car",
"_id" : "AXkFe2kBPzPPA2iKNVqD",
"_version" : 1,
"found" : true,
"_source" : {
"powerPS" : 100,
"@version" : "1",
"minPrice" : 1500,
"sdPrice" : 2261.647559,
"path" : "/home/oracle/ELK/logstash/car.csv",
"@timestamp" : "2019-03-14T07:05:29.692Z",
"host" : "test2",
"message" : "599,150000,2009,100,1500,28500,6066.1219,2261.647559r",
"km" : 150000,
"avgPrice" : 6066.1219,
"year" : 2009,
"count" : 599,
"maxPrice" : 28500
}
}
• Key Note:
• 1st caris Index Database→
• 2nd caris type Table→
• 3rd AXkFe2kBPzPPA2iKNVqDs is ID
17. Elasticsearch GET API, PUT API, POST API , DELETE API and bulk
upload
Example of PUTand POSTAPI
curl - XPUT 'http://192.168.56.103:9200/twitter/_doc/1?
pretty’
curl - XPOST 'http://192.168.56.103:9200/twitter/_doc?
pretty’
•Both are working as inserting documents of the table if you
use XPUTyou must define the unique id , if you don’t define
id then insert is failed .So to avoid this scenario just use
XPOSTmethod.
Example of DELETEAPI
curl -XDELETE
'http://192.168.56.103:9200/car/car/hY6ncWkB43xMmVfK_oJ
V‘
{"_index":"car","_type":"car","_id":"hY6ncWkB43xMmVfK_oJV",
"_version":2,"result":"deleted","_shards":
{"total":2,"successful":2,"failed":0},"_seq_no":348,"_primary_ter
m":5}
18. Elasticsearch GET API, PUT API, POST API , DELETE API and bulk
upload
Example of bulk upload
•ForImporting Heavy data in ELKprovide BULKAPI. Single
Records failure doesn’t impact yourwhole insert operation.
curl -s -H'Content-Type: application/x-ndjson' -XPOST
192.168.56.103:9200/test/_bulk?pretty --data-binary
@actresses.json
19. Elasticsearch Repository snapshot backup and restoration
• Main pre-requisite of taking snapshot is defined Repository. ForRepository
setup need to configure repo.path in elasticsearch.yml file .
• path.repo: ["/home/oracle/ELK/backup"]
Advantage of Snapshot
• Snapshots are incremental
• Reduces both the time and disk usage
• Manage clusterevent
20. Elasticsearch Repository snapshot backup and restoration
Creating Repository command
curl -H'Content-Type: application/json' -XPUT
'http://192.168.56.101:9200/_snapshot/my_repository' -d '{
"type": "fs",
"settings": {
"location": "my_repository",
"compress": true
}
}'
Output Should be
{"acknowledged":true}
Description:
Type: fs (shared File system), It can be S3 , HDFS (Hadoop), Azure cloud and google
cloud.
Compression: true means only metadata is compressed not data
•Verify Repository command
• curl -XGET'http://192.168.56.101:9200/_snapshot/my_repository'
21. Elasticsearch Repository snapshot backup and restoration
Creating Snapshot
curl -H'Content-Type: application/json' -XPUT
"http://192.168.56.101:9200/_snapshot/my_repository/snap_1?
wait_for_completion=true&pretty" -d '{
"indices": ".watches",
"ignore_unavailable": "true",
"include_global_state": false
}'
snap_1 : snapshot name
indices: table name formultiple table "test1,test2,",
ignore_unavailable: true means any indices is not present then snapshot is failed
include_global_state: true, false , partial. True means primary shared available snapshot
will fail
Verify Snapshot
curl -HXGET"http://192.168.56.101:9200/_snapshot/my_repository/snap_1?pretty"
23. Elasticsearch Head plugin
Elasticsearch Head Plugin forGoogle Chrome is very useful
tools you can run all types of select query, delete index ,
analyse index.
24. Why Elasticsearch is very fast ?
Percolation
•Suppose a userregistered a query in a elasticsearch, each new record is inserted into
the elasticsearch as perpercolation state.
It's like a reverse operation of indexing then searching. A real time example, the huge
network log is generated now need to store that log into the elasticsearch index and
generated a real time alarm from this. These logs contain severity type (1-5) and event
type. In here elasticsearch Percolation mechanism, data is inserted as peruserpre-
registered query into a elasticsearch.
So any matching query is generated triggers at the time of inserted into elasticsearch and
data are distributed as perpre-registered query.
25. Why Elasticsearch is very fast ?
Inverted index
An inverted index isadatabaseindex storing amapping from content, such aswordsor
numbers, to itslocationsin atable, or in adocument or aset of documents. Thepurpose
of an inverted index isto allow fast full-text searches, at acost of increased processing
when adocument isadded to thedatabase. Theinverted filemay bethedatabasefile
itself, rather than itsindex. It isthemost popular datastructureused in document
retrieval systems, used on alargescalefor examplein search enginesin elasticsearch.
In below exampleif user search “going” word then it’sdirectly search from document
1.
26. Strengths and limitations of Elasticsearch
• The strengths of Elasticsearch are as follows:
• Very flexible Query API:
• It supports JSON-based RESTAPI.
• Clients are available forall majorlanguages, such as Java, Python, PHP, and so on.
• It supports filtering, sort, pagination, and aggregations in the same query.
• Highly scalable:
• Clustering, replication of data, automatic failover are supported out of the box and are completely
transparent to the user. Formore details, referto the Availability and Horizontal Scalability section.
• Multi-language support:
• We discussed how stemming works and why it is important to remove the difference between the
different forms of root words. This process is completely different fordifferent languages.
Elasticsearch supports many languages out of the box.
• Aggregations:
• Aggregations are one of the reasons why Elasticsearch is like nothing out there.
• It comes with very a powerful analytics engine, which can help you slice and dice yourdata.
• It supports nested aggregations. Forexample, you can group users first by the city they live in and
then by theirgenderand then calculate the average age of each bucket.
• Performance:
• Due to the inverted index and the distributed nature, it is extremely high performing. The queries
you traditionally run using a batch processing engine, such as Hadoop, can now be executed in real
time.
• Intelligent filtercaching:
• The most recently used queries are cached. When the data is modified, the cache is invalidated
automatically.
27. Strengths and limitations of Elasticsearch
The limitations of Elasticsearch are as follows:
•Not real time - eventual consistency (nearreal time):
• The data you index is only available forsearch after1 sec. A process
known as refresh wakes up every 1 sec by default and makes the
data searchable.
•Doesn't support SQLlike joins but provides parent-child and
nested to handle relations.
•Doesn't support transactions and rollbacks:
Transactions in a distributed system are expensive. It offers
version-based control to make sure the update is happening on
the latest version of the document.
•Updates are expensive.
An update on the existing document deletes the document and
re-inserts it as a new document.
•Elasticsearch might lose data due to the following reasons:
• Network partitions.
• Multiple nodes going down at the same time.
28. Best two successful test-case for Elasticsearch
NASA using elasticsearch forfinding geo thermal parameteron
MARS.
•In the exploration of Mars, NASA sent Red Planet roveron Mars, billions of data
and images they get from theirsensor. So, it’s very tough task forNASA how to
handle and make decision from this Telemetry data. To operate this MARS rover
from earth, need to know Geo thermal parameter. Any mistake of finding Geo
thermal data could cause 2 billion dollarmachine, forovercoming this high data
analytics NASA usage elasticsearch engine.
More details please follow below link
https://www.elastic.co/elasticon/2015/sf/unlocking-interplanetary-
datasets-with-real-time-search
29. Best two successful test-case for Elasticsearch
UberCarfinding.
•Main point of this to get data userGeo-location data and
share the close passengerlocation data to driverand book the
carwithin the seconds. Elasticsearch helps this types of
business a lot. Basically, elasticsearch stores real time
passengergeo-location (longitude and latitude) and matches
the closest available drive location and sent the userlocation.
Formore details please check below link
•https://www.infoq.com/presentations/uber-elasticsearch-
clusters
30. Upcoming …
Next session deep drive with
Beats, Logstash and Kibana
Afterthat Final Session Iwill discuss with few demo
* Raw file uploaded in Elasticsearch do some filtering in
Logstash and Create report using Kibana
* logfile uploaded in Elasticsearch by using filebeats and
Logstash.
* Uploaded data from Relational Database to Elasticsearch by
using Logstash.
Main Goal of my this Operation is
visualize data in Kibana no matterthe
source is.
End