Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-Time Data Exploration and Analytics
with Amazon Elasticsearch Service
and Kibana
Darin Briskman
Amazon Web Services Technical Evangelist
briskman@amazon.com or @briskmad
23 Feb 2017
Jon Handler
AWS Principal Solutions Architect
handler@amazon.com or @_searchgeek

What to do with a terabyte of logs?

Log Analytics Architecture
data source Amazon Kinesis Firehose Amazon Elasticsearch
Service
Kibana
123 4
Query DSL
5

Open Source Distributed Index
Managed Service using Elasticsearch and Kibana
Fully managed; Zero admin
Highly Available and Reliable
RESTful API for easy integration
Amazon
Elasticsearch
Service

Amazon Elasticsearch Service Leading Use Cases
Log Analytics &
Operational Monitoring
• Monitor the performance of
applications, web servers, and
hardware
• Easy to use, powerful data
visualization tools to detect
issues quickly
• Dig into logs in an intuitive,
fine-grained way
• Kibana provides fast, easy
visualization
Search
• Application or website provides
search capabilities over diverse
documents
• Tasked with making this knowledge
base searchable and accessible
• Text matching, faceting, filtering,
fuzzy search, auto complete,
highlighting, and other search
features
• Query API to support application
search

Leading enterprises trust Amazon Elasticsearch
Service for their search and analytics applications
Media &
Entertainment
Online
Services
Technology Other

Adobe Developer Platform (Adobe I/O)
P R O B L E M
• Cost effective monitor
for XL amount of log
data
• Over 200,000 API calls
per second at peak -
destinations, response
times, bandwidth
• Integrate seamlessly
with other components
of AWS eco-system.
S O L U T I O N
• Log data is routed with
Amazon Kinesis to
Amazon Elasticsearch
Service, then displayed
using AES Kibana
• Adobe team can easily
see traffic patterns and
error rates, quickly
identifying anomalies and
potential challenges
B E N E F I T S
• Management and
operational simplicity
• Flexibility to try out
different cluster config
during dev and test
Amazon
Kinesis
Streams
Spark Streaming
Amazon
Elasticsearch
Service
Data
Sources
1

Amazon Elasticsearch Service
overview
Amazon Route
53
Elastic Load
Balancing
AWS IAM
Amazon
CloudWatch
Elasticsearch API
AWS CloudTrail

Shard 1 Shard 2 Shard 3 Shard 4
An index is a collection of documents, divided
into shards
Documents
Index
ID ID ID ID ID ID ID ID ID ID ID ID ID ID ID ID
...
Indexing, compression

Deployment of indices to a cluster
• Index 1
– Shard 1
– Shard 2
– Shard 3
• Index 2
– Shard 1
– Shard 2
– Shard 3
Amazon ES cluster
1
2
3
1
2
3
1
2
3
1
2
3
Primary Replica
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3

How many instances?
The index size will be about the same as the
corpus of source documents
• Double this if you are deploying an index replica
Size based on storage requirements
• Either local storage or 512 GB of Amazon Elastic
Block Store (EBS) per instance
• Example: 2 TB corpus will need 8 instances
– Assuming a replica and using EBS
– With i2.2xlarge nodes using 1.6 TB ephemeral storage, 4 nodes would
be enough

Cluster with no dedicated masters
Amazon ES cluster
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3

Cluster with dedicated masters
Amazon ES cluster
1
3
3
1
Instance 1
2
1
1
2
Instance 2
3
2
2
3
Instance 3Dedicated master nodes
Data nodes: queries and updates

Cluster with zone awareness
Amazon ES cluster
1
3
Instance 1
2
1 2
Instance 2
3
2
1
Instance 3
Availability Zone 1 Availability Zone 2
2
1
Instance 4
3
3

Best practices
Data nodes = Storage needed/Storage per node
Use GP2 EBS volumes
Use 3 dedicated master nodes for production deployments
Enable Zone Awareness
Set indices.fielddata.cache.size = 40

Amazon Elasticsearch Service benefits
Easy to use
Open-source
compatible
Secure
Highly available
AWS integrated
Scalable

Kinesis Firehose overview
Delivery Stream: Underlying
AWS resource
Destination: Amazon ES,
Amazon Redshift, or Amazon
S3
Record: Put records in
streams to deliver to
destinations

Firehose delivery architecture
intermediate
Amazon S3 bucket
backup S3 bucket
source records
data source
source records
Service
Firehose
delivery stream
delivery failure

Firehose delivery architecture with
transformations
intermediate
Amazon S3
bucket
backup S3 bucket
source records
data source
source records
Service
Firehose
delivery stream
transformed
records transformed
records
transformation failure
delivery failure

Kinesis Firehose features for ingest
Serverless scale Error handling S3 Backup

Best practices
Use smaller buffer sizes to increase throughput, but be
careful of concurrency
Use index rotation based on sizing
Default: stream limits: 2,000 transactions/second, 5,000
records/second, and 5 MB/second

Number of shards = index size/30GB
Define the number of shards
when you create the index
Less is more
Writes occupy 1 shard, reads
occupy all shards
Amazon ES cluster
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3

Mapping controls how data is indexed
not_analyzed text is best for
Kibana visualizations
Define a _template to
apply to all new indexes
The template also defines the
number of shards
0 delete 1,3,5
1 get 2,3,4,6
2 head 1,7,9
3 post 2,8
4 put 24
Index
Writer

Transform log lines to search documents
d104.aa.net - - [01/Jul/1995:00:00:15 -0400] "GET /images/KSC-
logosmall.gif HTTP/1.0" 200 1204
{"status": 200, "ident": "-", "@timestamp": "1995-07-
01T00:00:05", "request": "/images/KSC-logosmall.gif HTTP/1.0",
"auth": "-", "host": "d104.aa.net", "verb": "GET", "time":
"01/Jul/1995:00:00:15 -0400", "size": 1204}

Best practices
• Use a template for settings
• Set number of shards based on 30GB per shard
• Best case, 1 active shard per node
• For analysis use cases, set not_analyzed on all fields

Amazon ES aggregations
Buckets – a collection of documents meeting some criterion
Metrics – calculations on the content of buckets
Bucket: time
Metric:count

Best practices
Make sure that your fields are not_analyzed
Visualizations are based on buckets/metrics
Use a histogram on the x-axis first, then sub-aggregate

Run Elasticsearch in the AWS cloud with Amazon
Elasticsearch Service
Use Kinesis Firehose to ingest data simply
Kibana for monitoring, Elasticsearch queries for
deeper analysisAmazon
Elasticsearch
Service

Find out more:
https://aws.amazon.com/elasticsearch-service/
AWS Centralized Logging:
https://aws.amazon.com/answers/logging/centralized-logging/
Elasticsearch at the AWS Database Blog:
https://aws.amazon.com/blogs/database/category/elasticsearch/
Or ask your Solutions Architect!
Amazon
Elasticsearch
Service

Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service

Similar to Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service